Main Page | See live article | Alphabetical index

Redundant array of independent disks

The goal of a redundant array of independent disks (originally known as a redundant array of inexpensive disks) -- or RAID -- is to provide large reliable virtual disks that can be much larger than commonly available disk drives.

There are 7 official levels: RAID 0 to RAID 6. There can also be combinations of RAID levels, the most common combinations are RAID 10 and RAID 0+1.

RAID arrays are usually implemented with identically-sized disk drives.

Table of contents
1 Hardware vs. Software
2 RAID levels
2.1 RAID 0: Striped Disk Array without Fault Tolerance (Nonredundant)
2.2 RAID 1: Mirroring and Duplexing (Mirrored)
2.3 RAID 2: Error-Correcting Coding
2.4 RAID 3: Bit-Interleaved Parity (Richard M. Price Parity)
2.5 RAID 4: Dedicated parity drive (Block-Interleaved Parity)
2.6 RAID 5: Independent Data disks with distributed parity blocks (Block Interleaved Distributed Parity)
2.7 RAID 6: Independent Data Disks with Double Parity
2.8 RAID 10: A Stripe of Mirrors
2.9 RAID 0+1: A Mirror of Stripes
3 History

Hardware vs. Software

Any of the RAID levels listed below can be implemented in hardware or software.

With a software implementation, the operating system itself manages the disks of the array through the normal drive controller (IDE, SCSI, FC). This option can be slow, but it does not require the purchase of extra hardware.

A hardware implementation of RAID requires (at a minimum) a special-purpose RAID controller card. This controller handles the management of the disks, and performs parity calculations (needed for RAID 4, 5). This option tends to provide better performance, and makes operating system support easier.

Hardware implementations also typically support hot swap, allowing failed drives to be replaced while the system is running.

RAID levels

RAID 0: Striped Disk Array without Fault Tolerance (Nonredundant)

RAID Level 0 requires a minimum of 2 drives to implement.

Characteristics and Advantages

RAID 0 implements a striped disk array, the data is broken down into blocks and each block is written to a separate disk drive. I/O performance is greatly improved by spreading the I/O load across many channels and drives.

Best performance is achieved when data is striped across multiple controllers with only one drive per controller. No parity calculation overhead is involved. Very simple design, easy to implement.

Disadvantages

Not a "True" RAID because it is not fault-tolerant. The failure of just one drive will result in all data in an array being lost. Should never be used in mission critical environments that involve modification of data. (Some applications work with control information stored on a RAID 1 or 5 filesystem and multimedia data stored on RAID 0 and backed up to tape or optical media.)

Recommended Applications

RAID 1: Mirroring and Duplexing (Mirrored)

For Highest performance, the controller must be able to perform two concurrent separate reads per mirrored pair or two duplicate writes per mirrored pair.

RAID Level 1 requires a minimum of 2 drives to implement

Characteristics

One write or two reads possible per mirrored pair. Twice the read transaction rate of single disks, same write transaction rate as single disks. 100% redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk.

Transfer rate per block is equal to that of a single disk Under certain circumstances, RAID 1 can sustain multiple simultaneous drive failures.

Simplest RAID storage subsystem design.

Advantages

Since a disk of a mirrored pair has all the information, it can potentially be used without the RAID hardware/software.

Disadvantages

Highest disk overhead of all RAID types, (100%) inefficient.

Recommended Applications

RAID 2: Error-Correcting Coding

Redundancy scheme in RAID Level 2 is
Hamming code, where the striping unit is a single bit. Striping at the bit level has the implication that in a disk array with D data disks, the smallest unit of transfer for a read is a set of D blocks.

RAID level 2 is rarely implemented.

RAID 3: Bit-Interleaved Parity (Richard M. Price Parity)

RAID level 3 has a single check disk and only processes one I/O at a time.

RAID level 3 is rarely implemented.

RAID 4: Dedicated parity drive (Block-Interleaved Parity)

Characterisitcs

Disks are striped, as in RAID 0. Parity information for the stripe is calculated, and stored on a parity disk. If one of the data disks fails, the information is re-built on a spare disk using the parity information. If the parity disk fails, the parity information is recalculated on a spare disk.

Disadvantages

The parity drive can be a bottleneck during write operations.

RAID 5: Independent Data disks with distributed parity blocks (Block Interleaved Distributed Parity)

Every time a data "block" (sometimes called a "chunk") is written on a disk in an array, a parity block is generated within the same stripe. (A block or chunk is often comprised of many consecutive sectors on a disk, sometimes as many as 256 sectors. A series of chunks [a chunk from each of the disks in an array] is collectively called a "stripe".) If another block, or some portion of a block is written on that same stripe, the parity block (or some portion of the parity block) is recalculated and rewritten. The disk used for the parity block is staggered from one stripe to the next, hence the term "distributed parity blocks".

Interestingly, the parity blocks are not read on data reads, since this would be unnecessary overhead and would diminish performance. The parity blocks are read, however, when a read of a data sector results in a CRC error. In this case, the sector in the same relative position within each of the the remaining data blocks in the stripe and within the parity block in the stripe are used to reconstruct the errant sector. The CRC error is thus hidden from the main computer. Likewise, should a disk fail in the array, the parity blocks from the surviving disks are combined mathematically with the data blocks from the surviving disks to reconstruct the data on the failed drive "on-the-fly". This is sometimes called Interim Data Recovery Mode. The main computer is unaware that a disk drive has failed. Reading and writing to the drive array continues seamlessly, though with some performance degradation. In RAID 5 arrays which have only one parity block per stripe, the failure of a second drive results in total data loss.

RAID Level 5 requires a minimum of 3 drives to implement. The maximum number of drives is theoretically unlimited, but it is common practice to keep the maximum to 14 or less for RAID 5 implementations which have only one parity block per stripe. The reason for this restriction is that there is a greater likelihood that a drive will fail in an array when there is greater number of drives. (The Mean Time Between Failures [MTBF] value for a drive within the array becomes smaller.) In implementations with greater than 14 drives, RAID 5 with dual parity (also known as RAID 6) is sometimes used since it can survive the failure of two disks.

Characteristics and Advantages

Highest read data transaction rate. Medium to poor write data transaction rate, especially when the host CPU performs software parity checking. Low ratio of ECC (Parity) disks to data disks means high efficiency. Good aggregate transfer rate.

Disadvantages

Disk failure has a medium impact on throughput. Most complex controller design. Difficult to rebuild in the event of a disk failure (as compared to RAID level 1). Individual block data transfer rate same as single disk. High overhead for small writes. To change 1 byte in a file, the entire stripe must be read, the byte changed, the parity information re-calculated, and the entire stripe re-written. However, the fact that file systems tend to address disks naturally in clusters partially hides this effect.

Recommended Applications

RAID 6: Independent Data Disks with Double Parity

Entire data block is written to data disk; parity is generated and written to two distributed parity strips, on two separate drives.

RAID level 6 requires a minimum of three drives, but four are required to exceed RAID 1 space efficiency.

Characteristics

The most redundant parity array, very inefficient with low count of drives, but much more fault tolerant. Drives can be organized into orthogonal matricies, where rows of drives form parity groups, similar to RAID 5, while the columns also keep consistent parity data with each other. If a single drive fails, either its row or column parity may be used to rebuild it. Serveral drives on any one column or row may fail before the array is corrupt. Any group of non-coincident drives may fail before the array is corrupt.

RAID 10: A Stripe of Mirrors

Multiple RAID 1 mirrors are created, and a RAID 0 stripe is created over these. This is none of the original 6 levels, but a combination of RAID 1 and 0, sometimes also called RAID 1+0.

Advantages

Can potentially handle multiple simultaneous disk failures, as long as at least one disk of each mirrored pair is working.

Same advantages and disadvantages of RAID 1.

RAID 0+1: A Mirror of Stripes

Two RAID 0 stripes are created, and a RAID 1 mirror is created over them. This also isn't one of the original 6 RAID levels.

Disadvantages

Is not as robust as RAID 1+0. Cannot tolerate two simultaneous disk failures, if not from the same stripe.

History

RAID was first proposed in 1988 by David A. Patterson, Garth A. Gibson and Randy H. Katz in the paper, "A Case for Redundant Arrays of Inexpensive Disks (RAID)". This was published in the SIGMOD Conference 1988: pp 109-116. The term "RAID" started with this paper.

It was particularly ground-breaking work in that the concepts are "obvious". This paper spawned the entire disk array industry.

Also See