Short for Redundant Array of Independent Disks. A setup of multiple disks, combined to form a single logical unit. Data is stored redundantly across disks to improve reliability. Increases mean time to failure.
Protects against individual disk failures. Does not protect against total storage array failure. Data is often automatically replicated between 2 or more storage arrays.
Frequently combined NVRAM to improve write performance.
Mean Time To Failure
Section titled “Mean Time To Failure”Aka. MTTF. Higher is better.
Mean Time To Repair
Section titled “Mean Time To Repair”Aka. MTTR. Refers to the average time required to detect, replace, and rebuild a failed disk in a RAID array so the system returns to full redundancy. During this time, the system is vulnerable to additional failures. Lower is better.
Mean Time Between Failures
Section titled “Mean Time Between Failures”Aka. MTBF. Average time between failures of a system. Higher is better.
Hot Spare Disks
Section titled “Hot Spare Disks”Unallocated disks in RAID setup. Usually in a small number. Automatically replaced with a failed disk. Decreases MTTR.
Techniques
Section titled “Techniques”Striping
Section titled “Striping”When data is divided into blocks and distributed across multiple disks. Improves read/write speed by allowing parallel access.
Mirroring
Section titled “Mirroring”Aka. shadowing. Data is duplicated across multiple disks. If one disk fails, data can be retrieved from the other disk(s).
RAID Level
Section titled “RAID Level”Refers to different configurations of RAID. Each with its own method of data distribution and fault tolerance. 6 levels are available.
RAID 0
Section titled “RAID 0”Uses striping only. High speed. No redundancy. No fault tolerance. If 1 disk fails, all data is gone. 100% usable space. Used in non-critical systems where speed is prioritized over reliability.
RAID 1
Section titled “RAID 1”Uses mirroring only. Data is duplicated on 2 or more disks. Can survive 1 disk failure. Fast reads. More reliable. 50% usable space. Requires 2 block writes for each block write. Better write performance than RAID 5.
RAID 2
Section titled “RAID 2”Uses striping at bit level. A single disk store ECC bits. Too complex for implementation. Not used as much (RAID 3 is used instead). Memory-Style Error-Correcting-Codes (ECC) with bit striping.
RAID 3
Section titled “RAID 3”Uses striping at bit level. Corresponding parity bits are stored on a dedicated parity disk. Can recover 1 failed disk. XOR of bits used for parity. Faster data transfer. Less I/O operations per second because all disks must be synchronized for each read/write. Not used as much because bit-striping causes too much arm movement. RAID 5 is used instead.
RAID 4
Section titled “RAID 4”Uses striping at block-level. A dedicated disk stores block parity. Parity bit must be re-written for every block writes, which becomes a performance bottleneck. Not used as much (RAID 5 is used instead).
RAID 5
Section titled “RAID 5”Uses striping at block-level. Data and parity are distributed across all disks. Can survive 1 disk failure. Good read performance. Higher I/O rates than RAID 4. Avoids bottleneck of RAID 4. Moderate write performance because of extra parity calculation. Less cost compared to RAID 1. Requires 4 I/O operations (2 block reads, 2 block writes) for each block write.
Balances speed, cost, and fault tolerance. Used in general-purpose storage.
RAID 6
Section titled “RAID 6”Similar to RAID 5 but stores 2 parity blocks per stripe. Can survive 2 disk failures. Safer than RAID 5. Slower. Costlier. Slightly slower writes because of extra parity math. Not used as much, because RAID 1 or RAID 5 offer adequate protection for most applications.
Used in large-capacity arrays where rebuild times are long.
RAID 10
Section titled “RAID 10”Aka. RAID 1+0. Pairs disks for mirroring, then does striping. Can survive multiple disk fails as long as they are in different pairs. Very high read/write speed. Best mix of performance and reliability. Expensive.
Used in mission-critical systems.
RAID 01
Section titled “RAID 01”Aka. RAID 0+1. Creates a stripe set, then mirrors the set.
Mean Time To Data Loss
Section titled “Mean Time To Data Loss”Aka. MTTDL. Depends on MTTF, MTTR and the RAID level.
Suppose number of disks are used in a RAID array. , and are used instead of , and respectively for simplicity.
For RAID 0
Section titled “For RAID 0”For RAID 1
Section titled “For RAID 1”For RAID 5
Section titled “For RAID 5”For RAID 10
Section titled “For RAID 10”Extensions
Section titled “Extensions”RAID protects against disk failures, not data corruption.
Snapshots
Section titled “Snapshots”Frozen view before changes.
Replication
Section titled “Replication”Duplicate across sites. Can either be sync or async.
Zettabyte File System (ZFS) from Solaris adds end-to-end data integrity through checksums and self-healing (automatically fixes errors when redundancy exists) storage. Can find silent data corruption. Checksums are calculated per block for both data and metadata. Checksums are stored separately. When data is read, checksums are rechecked.
If the data is corrupted and the system is using mirrored or RAID-Z storage, ZFS can read a good copy from another disk and automatically repair the corrupted block.
ZFS also replaces partitions with dynamic storage pools (aka. zpools). Multiple file systems can dynamically share the pool.