What is a raid on a computer. RAID levels - a brief theoretical background

(+) : It has high reliability - it works as long as at least one disk in the array is functioning. The probability of failure of two disks at once is equal to the product of the probability of failure of each disk. In practice, if one of the drives fails, urgent measures should be taken to restore redundancy again. To do this, with any RAID level (except for zero), it is recommended to use hot spare drives. The advantage of this approach is maintaining constant availability.

(-) : The disadvantage is that you have to pay the cost of two hard drives, getting the usable volume of only one hard drive.

RAID 1 + 0 and RAID 0 + 1

Mirror on many discs - RAID 1 + 0  or RAID 0 + 1. By RAID 10 (RAID 1 + 0) they mean the option when two or more RAID 1 are combined into RAID 0. Under RAID 0 + 1, there are two options:

RAID 2

Arrays of this type are based on the use of Hamming code. Disks are divided into two groups: for data and for error correction codes; moreover, if data is stored on disks, then disks are necessary for storing correction codes. Data is distributed across disks designed to store information in the same way as in RAID 0, i.e. they are divided into small blocks according to the number of disks. The remaining disks store error correction codes, according to which information can be restored if any hard disk fails. The Hamming method has long been used in memory such as ECC and allows you to fix one-time and detect two-time errors on the fly.

Virtue  RAID 2 array is an increase in the speed of disk operations compared to the performance of a single disk.

Disadvantage  RAID 2 array is that the minimum number of disks at which it makes sense to use it is 7. At the same time, you need a structure of almost double the number of disks (for n \u003d 3, the data will be stored on 4 disks), so this type of array is not widespread . If the disks are about 30-60, then the cost overrun is 11-19%.


RAID 3

In a RAID 3 array of disks, data is split into chunks smaller than a sector (broken into bytes) or blocks and distributed across disks. Another disk is used to store parity blocks. In RAID 2, a disk was used for this purpose, but most of the information on the control disks was used to correct errors on the fly, while most users are satisfied with the simple recovery of information in the event of a disk failure, for which there is enough information that fits on one dedicated hard drive.

Differences of RAID 3 from RAID 2: impossibility of error correction on the fly and less redundancy.

Advantages:

  • high speed of reading and writing data;
  • the minimum number of disks to create an array is three.

Disadvantages:

  • an array of this type is good only for single-tasking with large files, since the access time to a single sector, divided into disks, is equal to the maximum of the intervals of access to the sectors of each of the disks. For small blocks, the access time is much longer than the read time.
  • a large load on the control disk, and, as a result, its reliability is greatly reduced compared to data storage disks.


RAID 4

RAID 4 is similar to RAID 3, but differs from it in that the data is divided into blocks, not bytes. Thus, it was possible to partially “defeat” the problem of low data transfer rate of a small amount. Recording is slow due to the fact that the parity for the block is generated during recording and is recorded on a single disc. Of the widespread storage systems, RAID-4 is used on NetApp storage devices (NetApp FAS), where its shortcomings have been successfully eliminated due to the work of disks in a special group recording mode, which is determined by the WAFL internal file system used on devices.

RAID 5

The main disadvantage of RAID levels from 2nd to 4th is the inability to perform parallel write operations, since a separate control disk is used to store parity information. RAID 5 does not have this drawback. Data blocks and checksums are cyclically written to all disks of the array; there is no asymmetric configuration of disks. By checksums is meant the result of an XOR operation (exclusive or). Xor  has a feature that is used in RAID 5, which makes it possible to replace any operand with a result, and, applying the algorithm xor, get the missing operand as a result. For example: a xor b \u003d c  (Where a, b, c  - three raid array disks), if a refuses, we can get it by putting in its place c  and having spent xor  between c  and b: c xor b \u003d a.  This applies regardless of the number of operands: a xor b xor c xor d \u003d e. If refuses c  then e  falls into his place and having spent xor  as a result we get c: a xor b xor e xor d \u003d c. This method essentially provides fault tolerance of version 5. To store the result of xor, only 1 disk is required, the size of which is equal to the size of any other disk in raid.

(+) : RAID5 is widespread, primarily due to its cost-effectiveness. The volume of a RAID5 disk array is calculated by the formula (n-1) * hddsize, where n is the number of disks in the array, and hddsize is the size of the smallest disk. For example, for an array of 4 disks of 80 gigabytes each, the total capacity will be (4 - 1) * 80 \u003d 240 gigabytes. Additional resources are spent on writing information on a RAID 5 volume and performance drops because it requires additional calculations and write operations, but there is a gain when reading (compared to a separate hard drive), because data streams from several disks of the array can be processed in parallel.

(-) : The performance of RAID 5 is noticeably lower, especially in operations like Random Write (writing in random order), in which the performance drops by 10-25% of the performance of RAID 0 (or RAID 10), since it requires more disk operations (each the server write operation is replaced on the RAID controller with three - one read operation and two write operations). The disadvantages of RAID 5 appear when one of the drives fails - the entire volume goes into critical mode (degrade), all write and read operations are accompanied by additional manipulations, and performance drops sharply. At the same time, the reliability level is reduced to the reliability of RAID-0 with the corresponding number of disks (that is, n times lower than the reliability of a single disk). If a failure occurs before the array is completely restored, or an unrecoverable read error occurs on at least one other drive, the array is destroyed, and the data on it cannot be restored using conventional methods. It should also be taken into account that the RAID Reconstruction process (RAID data recovery due to redundancy) after a disk fails causes an intensive load of reading from disks for many hours continuously, which can cause any of the remaining disks to fail at this least a protected period of RAID operation, as well as detect previously undetected read failures in cold data arrays (data that is not accessed during normal operation of the array, archived and inactive data), which increases the risk of failure during data recovery . The minimum number of drives used is three.

RAID 5EE

Note: Not supported on all RAID level-5EE controllers is similar to a RAID-5E array, but with more efficient use of a backup drive and shorter recovery time. Like RAID level-5E, this RAID level creates rows of data and checksums across all drives in the array. The RAID-5EE array offers enhanced protection and performance. When using RAID level-5E, the capacity of the logical volume is limited by the capacity of two physical drives in the array (one for control, one for backup). The backup drive is part of the RAID level-5EE array. However, unlike RAID level-5E, which uses unpartitioned free space for backup, checksum blocks are inserted in the backup level in RAID level-5EE, as shown in the following example. This allows you to quickly rebuild data in the event of a physical disk failure. With this configuration, you cannot use it with other arrays. If you need a spare disk for another array, you should have another backup hard drive. RAID level-5E requires at least four disks and, depending on the firmware level and their capacity, supports from 8 to 16 disks. RAID level-5E has certain firmware. Note: for RAID level-5EЕ, you can use only one logical volume in the array.

Advantages:

  • 100% data protection
  • Large physical disk capacity compared to RAID-1 or RAID -1E
  • Greater performance than RAID-5
  • Faster RAID Recovery over RAID-5E

Disadvantages:

  • Lower performance than RAID-1 or RAID-1E
  • Support for only one logical volume per array
  • Inability to share a backup disk with other arrays
  • Support for not all controllers

RAID 6

RAID 6 - similar to RAID 5, but has a higher degree of reliability - the capacity of 2 disks is allocated for checksums, 2 amounts are calculated using different algorithms. Requires a more powerful RAID controller. Provides performance after the simultaneous failure of two drives - protection against multiple failures. Arranging an array requires a minimum of 4 disks. Usually, the use of RAID-6 causes about 10-15% decrease in disk group performance, compared with the same performance of RAID-5, which is caused by a large amount of processing for the controller (the need to calculate the second checksum, as well as read and write more disk blocks when writing each block).

RAID 7

RAID 7 is a registered trademark of Storage Computer Corporation; a separate RAID level is not. The structure of the array is as follows: data is stored on the disks, one disk is used to store parity blocks. Writing to disks is cached using RAM, the array itself requires a mandatory UPS; in the event of a power failure, data corruption occurs.

RAID 10

RAID 10 architecture diagram

RAID 10 is a mirrored array in which data is written sequentially to several disks, as in RAID 0. This architecture is an array of type RAID 0, the segments of which are RAID 1 arrays instead of individual disks. Accordingly, an array of this level should contain at least 4 disks. RAID 10 combines high resiliency and performance.

Current controllers use this mode by default for RAID 1 + 0. That is, one main disk, the second - a mirror, data is read from them one at a time. Now we can assume that RAID 10 and RAID 1 + 0 are just a different name for the same disk mirroring method. The assertion that RAID 10 is the most reliable option for data storage is erroneous, because, despite the fact that for a given RAID level it is possible to maintain data integrity when half of the drives fail, irreversible destruction of the array occurs when two already fail drives, if they are in the same mirror pair.

Combined levels

In addition to the basic levels of RAID 0 - RAID 5, described in the standard, there are combined levels of RAID 1 + 0, RAID 3 + 0, RAID 5 + 0, RAID 1 + 5, which different manufacturers interpret each in their own way.

  • RAID 1 + 0 is a combination mirroring  and alternations  (see above).
  • RAID 5 + 0 is alternation  5th level volumes.
  • RAID 1 + 5 - RAID 5 of mirrored  steam

Combined levels inherit both the advantages and disadvantages of their “parents”: the emergence alternations  in the RAID 5 + 0 level, it does not add reliability at all, but it positively affects the performance. The RAID 1 + 5 level is probably very reliable, but not the fastest and, in addition, extremely uneconomical: the useful capacity of the volume is less than half the total capacity of the drives ...

It is worth noting that the number of hard drives in combined arrays will also change. For example, for RAID 5 + 0 use 6 or 8 hard drives, for RAID 1 + 0 - 4, 6 or 8.

Standard Level Comparison

Level Number of discs Effective Capacity * fault tolerance Benefits disadvantages
0 from 2 S * n not highest performance very low reliability
1 2 S 1 disk reliability
1E from 3 S * N / 2 1 disc ** high data security and good performance double cost of disk space
10 or 01 from 4, even S * N / 2 1 disc *** highest performance and high reliability double cost of disk space
5 from 3 to 16 S * (N - 1) 1 disk profitability, high reliability, good performance performance below RAID 0
50 from 6, even S * (N - 2) 2 discs ** high reliability and performance high cost and complexity of service
5E from 4 S * (N - 2) 1 disk economy, high reliability, speed higher than RAID 5
5EE from 4 S * (N - 2) 1 disk fast data reconstruction after a failure, profitability, high reliability, speed higher than RAID 5 performance below RAID 0 and 1, the backup drive is idling and not checked
6 from 4 S * (N - 2) 2 discs profitability, highest reliability performance lower than RAID 5
60 from 8, even S * (N - 2) 2 discs high reliability, large data volume
61 from 8, even S * (N - 2) / 2 2 discs ** very high reliability high cost and complexity of the organization

* N - the number of disks in the array, S - the size of the smallest disk. ** Information will not be lost if all drives within the same mirror fail. *** Information will not be lost if two disks fail within different mirrors.

Matrix RAID

Matrix RAID is a technology implemented by Intel in its chipsets starting with ICH6R. Strictly speaking, this technology is not a new RAID level (its analogue exists in high-level hardware RAID controllers), it allows, using a small number of disks, to organize simultaneously one or several RAID 1, RAID 0 and RAID 5 arrays. This allows for a relatively little money provides for some data increased reliability, and for others, high speed access and production.

Additional RAID Features

Many RAID controllers are equipped with a set of additional functions:

  • Hot Swap
  • Hot Spare
  • Check for stability.

Software (eng. software) RAID

To implement RAID, you can use not only hardware, but also fully software components (drivers). For example, Linux kernel systems have special kernel modules, and you can manage RAID devices using the mdadm utility. Software RAID has its own advantages and disadvantages. On the one hand, it costs nothing (unlike hardware RAID controllers, the price of which is from $ 250). Software RAID, on the other hand, uses the resources of the central processor, and at times of peak load on the disk system, the processor can spend a significant part of the power on servicing RAID devices.

The Linux 2.6.28 kernel (the last one released in 2008) supports software RAID of the following levels: 0, 1, 4, 5, 6, 10. Implementation allows you to create RAID on separate disk partitions, which is similar to the Matrix RAID described above. Supports booting from RAID.

Further development of the idea of \u200b\u200bRAID

The idea of \u200b\u200bRAID arrays is to combine disks, each of which is regarded as a set of sectors, and as a result, the file system driver “sees” a single disk and works with it, not paying attention to its internal structure. However, you can achieve a significant increase in the performance and reliability of the disk system if the file system driver “knows” that it works not with one disk, but with a set of disks.

Moreover: if any of the disks in the RAID-0 is destroyed, all the information in the array will be lost. But if the file system driver placed each file on one disk, and at the same time the directory structure is properly organized, then if any of the disks is destroyed, only the files located on this disk will be lost; and files entirely located on the saved disks will remain available.

Employee of the corporation Y-E Data, which is the world's largest manufacturer of USB floppy drives, Daniel Olson as an experiment created a RAID array of four

RAID  - the abbreviation stands for Redundant Array of Independent Disks - “fault-tolerant array of independent disks” (previously, the word Inexpensive was sometimes used instead of Independent). The concept of a structure consisting of several disks united in a group providing fault tolerance was born in 1987 in the fundamental work of Patterson, Gibson and Katz.

Original RAID Types

RAID-0
If we consider that RAID is “fault tolerance” (Redundant ...), then RAID-0 is “zero fault tolerance”, its absence. The RAID-0 structure is a “striped drive array”. Data blocks are written one by one to all disks included in the array, in order. This improves performance, ideally as many times as many disks are included in the array, since the recording is parallelized between several devices.
  However, reliability is reduced by the same amount, since data will be lost if any of the disks in the array fails.

RAID-1
  This is the so-called “mirror”. Write operations are performed on two discs in parallel. The reliability of such an array is higher than that of a single disk, however, the performance improves slightly (or does not increase at all).

RAID-10
  An attempt to combine the advantages of the two types of RAID and deprive them of their inherent disadvantages. If we take a RAID-0 group with increased performance and give each of them (or the entire array) “mirrored” disks to protect data from loss as a result of failure, we will get a fault-tolerant array with increased performance as a result of striping.
  Today, "in the wild" is one of the most popular types of RAID.
  Cons - we pay for all of the above advantages with half the total capacity of the disks included in the array.

RAID-2
  Remained a completely theoretical option. This is an array in which the data is encoded by a noise-resistant Hamming code, which allows you to recover individual bad fragments due to its redundancy. By the way, various modifications of the Hamming code, as well as its descendants, are used in the process of reading data from the magnetic heads of hard drives and optical CD / DVD readers.

RAID-3 and 4
  “Creative development” of the idea of \u200b\u200bprotecting data with redundant code. The Hamming code is indispensable in the case of a “permanently unreliable” stream saturated with continuous weakly predictable errors, such as, for example, a noisy over-the-air communication channel. However, in the case of hard disks, the main problem is not read errors (we believe that the data is issued by hard disks in the form in which we wrote them, if it works), but the whole disk fails.
  For such conditions, it is possible to combine a striping scheme (RAID-0) and to protect against failure of one of the disks, supplement the recorded information with redundancy, which will allow data to be restored when some part is lost, allocating an additional disk for this.
If any of the data disks is lost, we can restore the data stored on it by simple mathematical operations on the redundancy data, in the event of a failure of the redundancy data disk, we still have data read from a RAID-0 disk array.
  The RAID-3 and RAID-4 options differ in that in the first case, separate bytes alternate, and in the second, byte groups, “blocks”.
  The main disadvantage of these two schemes is the extremely low write speed to the array, since each write operation updates the “checksum”, the redundancy block for the recorded information. Obviously, despite the striped structure, the performance of a RAID-3 and RAID-4 array is limited by the performance of a single drive, the one on which the “redundancy block” lies.

RAID 5
  An attempt to circumvent this limitation gave rise to the following type of RAID; at present, it has received, along with RAID-10, the most distribution. If writing to the “redundancy block” disk limits the entire array, let's spread it over the array disks as well, make an unallocated disk for this information, and thereby the redundancy update operations will be distributed across all disks of the array. That is, we, as in the case of RAID-3 (4), take disks for storing N information in the amount of N + 1 disks, but unlike Type 3 and 4, this disk is also used to store data mixed with redundancy data, like the rest N.
  Disadvantages? And what about without them. The problem with slow recording was partially resolved, but still not completely. Writing to a RAID-5 array is, however, slower than writing to a RAID-10 array. But RAID-5 is more “cost-effective”. For RAID-10, we pay for fault tolerance with exactly half of the drives, and in the case of RAID-5, this is just one drive.

However, the write speed decreases in proportion to the increase in the number of disks in the array (unlike RAID-0, where it is only growing). This is due to the fact that when writing a data block, the array needs to re-calculate the redundancy block, for which it is necessary to read the remaining “horizontal” blocks and recalculate the redundancy block in accordance with their data. That is, for one write operation, an array of 8 disks (7 data disks + 1 additional) will do 6 read operations in the cache (the remaining data blocks from all disks to calculate the redundancy block), calculate the redundancy block from these blocks, and make 2 writes (recording a block of recorded data and overwriting a redundancy block). In modern systems, the sharpness is partially mitigated by caching, but nevertheless the lengthening of the RAID-5 group, although it causes a proportional increase in read speed, but also a corresponding decrease in write speed.
  The situation with a decrease in performance when writing to RAID-5 sometimes generates a curious extremism, for example, http://www.baarf.com/;)

Nevertheless, since RAID-5 is the most efficient RAID structure in terms of disk consumption per linear megabyte, it is widely used where a decrease in write speed is not a decisive parameter, for example, for long-term data storage or for data that is mainly read.
  Separately, it should be noted that the expansion of the RAID-5 disk array by adding an additional disk causes a complete recount of the entire RAID, which can take hours, and in some cases days, during which the performance of the array dramatically decreases.

RAID-6
  Further development of the idea of \u200b\u200bRAID-5. If we calculate the additional redundancy according to a different law than that used in RAID-5, then we will be able to maintain access to the data if two disks of the array fail.
  The price for this is an additional disk for the data of the second “redundancy block”. That is, to store data equal to the volume of N disks, we will need to take N + 2 disks. The “mathematics” of calculating the redundancy blocks is complicated, which causes an even greater decrease in write speed compared to RAID-5, but reliability increases. Moreover, in some cases, it even exceeds the level of reliability of RAID-10. It is easy to see that RAID-10 also withstands the failure of two disks in the array, however, if these disks belong to the same “mirror” or to different, but not two, mirror disks. And the probability of just such a situation cannot be discounted.

A further increase in the numbers of RAID types occurs due to "hybridization", so there are RAID-0 + 1 RAID-10 that has already been considered, or all sorts of chimeric RAID-51 and so on.
  Fortunately, they do not occur in wildlife, usually remaining a “dream of the mind” (well, except for the RAID-10 already described above).

The problem of increasing the reliability of information storage is always on the agenda. This is especially true for large data arrays, databases on which the operation of complex systems in a wide range of industry sectors depends. This is especially important for high performance  servers.

As you know, the performance of modern processors is growing steadily, for which modern
  hard drives. Having one drive, be it SCSI or, even worse IDE, is already will not be able to solve  tasks relevant to our time. You need a lot of disks that will complement each other, replace if one of them comes out, store backups, work efficiently and productively.

However, just having a few hard drives is not enough, you need them combine into a system, which will work smoothly and will not allow data loss in case of any failures associated with disks.

The creation of such a system must be taken care of in advance, because, as the famous proverb says, until  fried the cock does not bite  - not enough. You can lose your data irrevocably.

This system can become RAID  - virtual storage technology that combines multiple disks into one logical element. RAID array called excess array  independent drives. Usually used to improve performance and reliability.

What do you need to create a raid? At least two hard drives. Depending on the level of the array, the number of storage devices used varies.

What are raid arrays

There are basic, combined RAID arrays. Berkeley Institute of California proposed splitting raid into specification levels:

  • Basic:
    • RAID 1 ;
    • RAID 2 ;
    • RAID 3 ;
    • RAID 4 ;
    • RAID 5 ;
    • RAID 6 .
  • Combined:
    • RAID 10 ;
    • RAID 01 ;
    • RAID 50 ;
    • RAID 05 ;
    • RAID 60 ;
    • RAID 06 .

Consider the most commonly used.

Raid 0

RAID 0 is intended  to increase speed and recording. It does not increase the reliability of storage, in this regard, is not redundant. His name is also stripe (striping - striping) Usually is used  2 to 4 discs.

Data is divided into blocks that write in turn to discs. Speed  read / write increases in this case in the number of times a multiple of the number of disks. Of disadvantages an increased likelihood of data loss with such a system can be noted. It makes no sense to store databases on such disks, because any serious glitch  will lead to the complete inoperability of the raid, since there are no recovery tools.

Raid 1

RAID 1 provides mirror  data storage at the hardware level. Also called an array Mirror, What means « mirror» . That is, the disk data in this case are duplicated. Can use  with the number of storage devices from 2 to 4.

Speed  write / read while it practically does not change, which can be attributed to the benefits. The array works if at least one disk of the raid is in operation, but the volume of the system is equal to the volume of one disk. In practice, when failure  one of the hard drives you will need to take measures to replace it as soon as possible.

Raid 2

RAID 2 - uses the so-called hamming code. Data is divided into hard drives similar to RAID 0, the remaining drives are stored error correction codesat which failure can be to regenerate  information. This method allows on the fly findand then correct  system crashes.

Rapidity read / write  in this case compared to using a single disk is rising. The downside is the large number of disks in which it is rational to use it so that there is no data redundancy, usually 7 and more.

RAID 3 - in the array, the data is divided into all the drive except one, in which the byte of parity is stored. Resistant to system failures. If one of the drives out of order. Then his information is easy to “raise” using the data of the parity checksums.

Compared to RAID 2 no opportunity  error correction on the fly. This array is different high performance  and the ability to use from 3 drives or more.

The main minus  Such a system can be considered an increased load on the disk that stores parity bytes and low reliability of this disk.

Raid 4

In general, RAID 4 is similar to RAID 3 with the differencethat the parity data is stored in blocks, and not in bytes, which allowed to increase the data transfer rate of small volume.

Minus  the specified array is the write speed, because write parity is generated on one single drive, like RAID 3.

It seems to be a good solution for those servers where files are more often read than written.

Raid 5

RAID 2 to 4 have disadvantages associated with the inability to parallelize write operations. RAID 5 eliminates this flaw. Parity blocks are written at the same time  to all disk devices of the array, no asynchrony  in data distribution, which means that parity is distributed.

Number  used hard drives from 3. The array is very common due to its universality  and cost-effectivenessthe more drives are used, the more disk space will be used up. Speed  wherein high  due to data parallelization, but performance  reduced compared to RAID 10, due to the large number of operations. If one drive fails, reliability drops to RAID 0. It takes a long time to recover.

Raid 6

RAID 6 technology similar to RAID 5, but improved reliability  by increasing the number of parity disks.

However, disks already require at least 5 and a more powerful processor to handle the increased number of operations, and the number of disks must necessarily be equal to a prime number of 5,7,11 and so on.

Raid 10, 50, 60

Next come combinations  raids mentioned earlier. For example, RAID 10 is RAID 0 + RAID 1.

They inherit and advantages  arrays of their components in terms of reliability, performance and the number of disks, and at the same time cost-effectiveness.

Creating an array raid on a home PC

The advantages of creating a house raid array are not obvious, due to the fact that it uneconomical, data loss is not so critical compared to servers, and information  can be stored in backups, periodically making backups.

For these purposes you will need raid controllerhaving its own BIOS and its own settings. In modern motherboards, the raid controller can be integrated  to the south bridge of the chipset. But even in such cards, by connecting to a PCI or PCI-E connector, you can connect another controller. Examples are Silicon Image and JMicron devices.

Each controller can have its own configuration utility.

Consider creating a raid using the Intel Matrix Storage Manager Option ROM.

Carry over  all data from your disks, otherwise they will be in the process of creating an array cleared.

Go to BIOSSetup  Your motherboard and turn on the mode of operation RAID  for your sata hard drive.

To start the utility, restart the PC, click ctrl + i  during the procedure Post. In the program window you will see a list of available disks. Click Create massive, Then select required array level.

Subsequently, using the intuitive interface, enter array size  and confirm its creation.

© Andrey Egorov, 2005, 2006. TIM Group of Companies.

Forum visitors ask us the question: “What is the most reliable RAID level?” Everyone knows that the most common RAID5 level is, however, it is by no means without serious drawbacks that are not obvious to non-specialists.

RAID 0, RAID 1, RAID 5, RAID6, RAID 10 or what are RAID levels?

In my article I will try to characterize the most popular RAID levels, and then I will make recommendations for using these levels. To illustrate the article, I built a diagram on which I placed these levels in a three-dimensional space of reliability, productivity, and cost effectiveness.

Jbod  (Just a Bunch of Disks) is a simple spanning of hard drives that is not formally a RAID level. A JBOD volume can be an array of one disk or a combination of several disks. The RAID controller does not need to perform any calculations to work with such a volume. In our diagram, the JBOD drive serves as the “ordinar" or the starting point - its reliability, performance and cost values \u200b\u200bcoincide with the corresponding indices of a single hard drive.

RAID 0(“Striping”) does not have redundancy, but distributes the information immediately to all the disks in the array in the form of small blocks (“strips”). Due to this, productivity increases significantly, but reliability suffers. As in the case of JBOD, for our money we get 100% of the disk capacity.

I’ll explain why the reliability of data storage on any composite volume decreases - since in the event of failure of any of the hard drives included in it, all information is completely and irretrievably lost. In accordance with probability theory, mathematically, the reliability of a RAID0 volume is equal to the product of the reliability of its constituent disks, each of which is less than unity, so the total reliability is certainly lower than the reliability of any disk.

Good level - RAID 1  (“Mirroring”, “mirror”). It has protection against the failure of half of the available hardware (in the general case, one of the two hard drives), provides an acceptable write speed and a gain in read speed due to parallelization of requests. The disadvantage is that you have to pay the cost of two hard drives, getting the usable volume of one hard drive.

Initially, it is assumed that a hard drive is a reliable thing. Accordingly, the probability of failure of two disks at once is equal (according to the formula) to the product of probabilities, i.e. lower by orders of magnitude! Unfortunately, real life is not a theory! Two hard drives are taken from the same batch and work under the same conditions, and if one of the drives fails, the load on the remaining one increases, therefore, in practice, if one of the drives fails, it is urgent to take measures to restore redundancy again. To do this, with any RAID level (except zero), it is recommended to use hot spare drives Hotspare. The advantage of this approach is maintaining constant reliability. The disadvantage is even greater costs (i.e. the cost of 3 hard drives for storing the volume of one disk).

Mirror on many drives is a level RAID 10. When using this level, mirrored pairs of disks are arranged in a “chain”, so the volume of the resulting volume may exceed the capacity of one hard disk. Advantages and disadvantages are the same as RAID1. As in other cases, it is recommended to include HotSpare drives in the array at the rate of one spare for five workers.

RAID 5Indeed, the most popular of the levels - primarily due to its efficiency. Sacrificing for the sake of redundancy the capacity of only one disk from the array, we get protection from the failure of any of the hard drives of the volume. Additional resources are spent on writing information on that RAID5, as additional calculations are required, but there is a gain when reading (compared to a separate hard drive), because data streams from several storage drives are parallelized.

The disadvantages of RAID5 appear when one of the disks fails - the entire volume goes into critical mode, all write and read operations are accompanied by additional manipulations, performance drops sharply, and the disks start to warm up. If urgent measures are not taken, the entire volume may be lost. Therefore, (see above), you must use a Hot Spare drive with a RAID5 volume.

In addition to the basic RAID0 - RAID5 levels described in the standard, there are combined levels of RAID10, RAID30, RAID50, RAID15, which different manufacturers interpret each in their own way.

The essence of such combinations is briefly as follows. RAID10 is a combination of unity and zero (see above). RAID50 is a combination of the “0” level 5 volumes. RAID15 - the "mirror" of the "fives." And so on.

Thus, combined levels inherit the advantages (and disadvantages) of their “parents”. So, the appearance of a “zero” in the level RAID 50  It doesn’t add reliability to it at all, but it has a positive effect on performance. Level RAID 15probably very reliable, but it is not the fastest and, moreover, extremely uneconomical (the useful capacity of the volume is less than half the size of the original disk array).

RAID 6  differs from RAID 5 in that in each row of data (in English stripe) has not one but two  block checksums. Checksums are “multidimensional”, i.e. independent of each other, so even the failure of two disks in the array allows you to save the original data. The calculation of checksums using the Reed-Solomon method requires more intensive calculations than RAID5, so the sixth level was practically never used before. Now it is supported by many products, as they began to install specialized microcircuits that perform all the necessary mathematical operations.

According to some studies, the restoration of integrity after a single disk failure on a RAID5 volume composed of large-capacity SATA disks (400 and 500 gigabytes) in 5% of cases results in data loss. In other words, in one case out of twenty during the regeneration of the RAID5 array to the Hot Spare reserve disk, the second disk may fail ... Hence the recommendations of the best RAID leads: 1) is always  make backups; 2) use RAID6!

Recently, new levels of RAID1E, RAID5E, RAID5EE have appeared. The letter “E” in the name means Enhanced.

RAID level-1 Enhanced (RAID level-1E)  combines mirroring and data striping. This mixture of levels 0 and 1 is structured as follows. The data in the row is distributed exactly the same as in RAID 0. That is, the data series has no redundancy. The next row of data blocks copies the previous one with a shift of one block. Thus, as in the standard RAID 1 mode, each data block has a mirror copy on one of the disks, therefore the usable volume of the array is equal to half the total amount of the hard disks included in the array. RAID 1E requires the combination of three or more drives.

I really like the RAID1E level. For a powerful graphic workstation or even for a home computer - the best choice! It has all the advantages of zero and first levels - excellent speed and high reliability.

Let's move on to the level RAID level-5 Enhanced (RAID level-5E). This is the same as RAID5, only with a backup drive built into the array spare drive. This embedding is performed as follows: on all disks of the array, 1 / N of the space is left free, which is used as a hot spare when one of the disks fails. Due to this, RAID5E demonstrates, along with reliability, better performance, since reading / writing is performed in parallel with a large number of drives at the same time and spare drive is not idle, as in RAID5. Obviously, the backup disk that is included in that volume cannot be shared with other volumes (dedicated vs. shared). RAID 5E volume is built on at least four physical disks. The usable volume of the logical volume is calculated using the formula N-2.

RAID level-5E Enhanced (RAID level-5EE)  similar to RAID level-5E, but it has a more efficient distribution of spare drive and, as a result, a faster recovery time. Like the RAID5E level, this RAID level distributes data blocks and checksums in a series. But it also distributes free spare drive blocks, and does not just leave part of the disk space for these purposes. This reduces the time required to reconstruct the integrity of a RAID5EE volume. The backup disk in the volume cannot be shared with other volumes - as in the previous case. RAID 5EE volume is built on at least four physical disks. The usable volume of the logical volume is calculated using the formula N-2.

Oddly enough, no mention of level RAID 6E  I haven’t found it on the Internet - until such a level is offered by any of the manufacturers and is not even announced. But the level of RAID6E (or RAID6EE?) Can be offered on the same principle as the previous one. Disk Hotspare necessarily  must accompany any RAID volume, including RAID 6. Of course, we will not lose information if one or two disks fail, but it is extremely important to start the regeneration of array integrity as soon as possible in order to bring the system out of “critical” mode as soon as possible. Since the need for a Hot Spare disk is beyond doubt for us, it would be logical to go further and “smudge” it according to the way it was done in RAID 5EE, in order to get the benefits of using more disks (better read / write speed and faster restoration of integrity).

RAID levels in "numbers".

In the table, I collected some important parameters of almost all levels of RAID, so that you can compare them with each other and more clearly understand their essence.

Level
~~~~~~~

Huts
exactly
nost
~~~~~~~

Use-
disk capacity
~~~~~~~

Production
child
nost
reading

~~~~~~~

Production
child
nost
records

~~~~~~~

Built-in
drive
reserve

~~~~~~~

Min number of discs
~~~~~~~

Max. number of discs

~~~~~~~

Ex

Ex

Ex

Ex

All “mirror” levels are RAID 1, 1 + 0, 10, 1E, 1E0.

Let's try again to thoroughly understand, what is the difference between these levels?

RAID 1.
This is a classic mirror. Two (and only two!) Hard drives work as one, being a complete copy of each other. Failure of any of these two drives does not result in the loss of your data, as the controller continues to work with the remaining drive. RAID1 in numbers: double redundancy, double reliability, double cost. Write performance is equivalent to the performance of a single hard drive. Read performance is better because the controller can distribute read operations between two drives.

RAID 10
The essence of this level is that the array disks are combined in pairs into “mirrors” (RAID 1), and then all these mirror pairs are in turn combined into a single striped array (RAID 0). That is why it is sometimes referred to as RAID 1 + 0. The important point is that in RAID 10 you can combine only an even number of disks (minimum 4, maximum 16). Advantages: reliability is inherited from the “mirror”, and performance for both reading and writing is from “zero”.

RAID 1E.
The letter "E" in the name means "Enhanced", i.e. "improved". The principle of this improvement is as follows: the data in blocks is "striped" (striped) on all disks of the array, and then again "alternated" with a shift by one disk. RAID 1E can combine from three to 16 drives. Reliability corresponds to the “tens” indicators, and productivity due to the greater “alternation” becomes a little better.

RAID 1E0.
This level is implemented as follows: we create a "zero" array from RAID1E arrays. Therefore, the total number of disks should be a multiple of three: a minimum of three and a maximum of sixty! In this case, we are unlikely to get an advantage in speed, and the complexity of the implementation may adversely affect reliability. The main advantage is the ability to combine a very large (up to 60) number of disks into one array.

The similarity of all RAID 1X levels lies in their redundancy indicators: exactly 50% of the total capacity of the array disks is sacrificed for the sake of reliability.

RAID array. What is it? What for? And how to create?

Over the decades of development of the computer industry, computer storage media have passed a serious evolutionary path of development. Punched tapes and punched cards, magnetic tapes and drums, magnetic, optical and magneto-optical disks, semiconductor drives - this is just a short list of technologies already tested. Now in the laboratories of the world, attempts are being made to create holographic and quantum drives, which will significantly increase the recording density and reliability of its storage.

In the meantime, the most common means of storing information in a personal computer for a long time remain hard drives. Otherwise, they may be called HDDs (hard disk drives), hard drives, hard disks, but the essence does not change from the name change - these are drives with a package of magnetic disks in a single case.

The first hard drive, called the IBM 350, was assembled on January 10, 1955 in the laboratory of the American company IBM. With the size of a good cabinet and weight per ton, this hard drive contained five megabytes of information. From the modern point of view, such a volume cannot even be called ridiculous, but during the massive use of punch cards and magnetic tapes with sequential access, this was a huge technological breakthrough.


Unloading the first IBM 350 hard drive from an airplane

Less than six decades have passed since that day, but now you will not surprise anyone with a hard drive weighing less than two hundred grams, ten centimeters long and a terabyte of information. At the same time, the technology of writing, storing and reading data is fundamentally no different from that used in the IBM 350 - the same magnetic plates and reading / writing heads sliding over them.


The evolution of hard drives on the background of the inch line (photo from " Wikipedia " )

Unfortunately, it is precisely the features of this technology that cause two main problems that arise with the use of hard drives. The first of these is the too low write, read and transfer speed of information from the disk to the processor. In a modern computer, it is the hard drive that is the slow device that often determines the performance of the entire system as a whole.

The second problem is the lack of security of information stored on the hard drive. If the hard drive crashes, you can irretrievably lose all the data that was stored on it. And it’s good if the losses are limited to the loss of a family photo album (although this is actually not good enough). Destroying important financial and marketing information can cause a business collapse.

Partly, it helps to protect the stored information by regular backup (backup) of all or only important data on the hard drive. But in this case, if it breaks, that part of the data that has been updated since the last backup will be lost.

Fortunately, there are methods that help eliminate the above disadvantages of traditional hard drives. One of these methods is the creation of RAID - arrays from several hard drives.

What is RAID?

On the Internet and even modern computer literature, the term “RAID array” can often be found, which is actually a tautology, since the acronym RAID (redundant array of independent disks) is already deciphered as a “redundant array of independent disks”.

The name fully reveals the physical meaning of such arrays - it is a set of two or more hard drives. Collaboration of these disks is controlled by a special controller. As a result of the controller’s operation, such arrays are perceived by the operating system as a single hard disk and the user may not have to think about the nuances of controlling the operation of each hard drive separately.

There are several basic types of RAID, each of which has a different effect on the overall reliability and speed of the array compared to single disks. They are designated by a conventional number from 0 to 6. A similar designation with a detailed description of the architecture and principle of operation of the arrays was proposed by specialists at the University of California at Berkeley. In addition to the main seven types of RAID, various combinations are possible. Let's consider them further.

This is the simplest type of hard drive array, the main purpose of which is to increase the performance of the computer's disk subsystem. This is achieved by dividing the streams of recorded (read) information into several substreams, which are simultaneously written (read) to several hard drives. As a result, the total speed of information exchange, for example, for double-disk arrays increases by 30-50% compared to one hard drive of the same type.

The total volume of RAID 0 is the sum of the volumes of the hard drives included in it. The splitting of information is performed on data blocks of a fixed length, regardless of the length of the recorded files.

The main advantage of RAID 0 is a significant increase in the speed of information exchange between the disk system without losing the useful volume of hard drives. The disadvantage is a decrease in the overall reliability of the storage system. If any of the RAID 0 disks fails, all the information recorded in the array is irretrievably lost.

Like the one discussed above, this type of array is also the easiest to organize. It is built on the basis of two hard drives, each of which is an exact (mirror) reflection of the other. Information is written in parallel to both disks in the array. Data is read simultaneously from both disks in successive blocks (query parallelization), due to which a slight increase in read speed is achieved compared to a single hard disk.

The total volume of RAID 1 is equal to the volume of the smaller of the hard disks included in the array.

Advantages of RAID 1: high reliability of information storage (data is not damaged while at least one of the disks included in the array is intact) and some increase in read speed. The disadvantage is that when you buy two hard drives, you get only one usable volume. Despite the loss of half the usable volume, “mirror” arrays are quite popular due to their high reliability and relatively low cost — a pair of disks is still cheaper than four or eight.

When constructing these arrays, an information recovery algorithm using Hamming codes is used (an American engineer who developed this algorithm in 1950 to correct errors in the operation of electromechanical computers). To ensure the operation of this RAID controller, two groups of disks are created - one for storing data, the second group for storing error correction codes.

This type of RAID is not widespread in home systems due to the excessive redundancy of the number of hard drives - for example, in an array of seven hard drives, only four will be allocated for data. With an increase in the number of disks, redundancy decreases, which is reflected in the table.

The main advantage of RAID 2 is the ability to correct errors on the fly without reducing the speed of data exchange between the disk array and the central processor.

RAID 3 and RAID 4

These two types of disk arrays are very similar in construction scheme. In both, several hard drives are used to store information, one of which is used exclusively for placing checksums. Three hard drives are enough to create RAID 3 and RAID 4. Unlike RAID 2, data recovery on the fly is impossible - information is restored after replacing a failed hard disk for some time.

The difference between RAID 3 and RAID 4 is the level of data partitioning. In RAID 3, information is divided into separate bytes, which leads to a serious slowdown when writing / reading a large number of small files. In RAID 4, data is divided into separate blocks whose size does not exceed the size of one sector on the disk. As a result, the processing speed of small files is increased, which is critical for personal computers. For this reason, RAID 4 is more widespread.

A significant drawback of the arrays under consideration is the increased load on the hard drive intended for storing checksums, which significantly reduces its resource.

Disk arrays of this type are actually a development of the RAID 3 / RAID 4 scheme. A distinctive feature is that a separate disk is not used to store checksums - they are evenly distributed across all the hard disks of the array. The result of the distribution was the possibility of parallel writing to several disks at once, which slightly increases the data exchange speed compared to RAID 3 or RAID 4. However, this increase is not so significant, as additional system resources are spent on checksums using the "exclusive or" operation. At the same time, the reading speed increases significantly, since a simple parallelization of the process is possible.

The minimum number of hard drives for building RAID 5 is three.

Arrays built on the RAID 5 scheme have a very significant drawback. If any drive fails after replacing it, it takes several hours to fully recover the information. At this time, the intact hard drives of the array are operating in super-intensive mode, which significantly increases the likelihood of failure of the second drive and complete loss of information. Although rare, but this happens. In addition, during RAID 5 integrity recovery, the array is almost completely occupied by this process and current write / read operations are performed with long delays. If for most ordinary users this is not critical, then in the corporate sector such delays can lead to certain financial losses.

To a large extent, the above problem is solved by building arrays using the RAID 6 scheme. In these structures, the storage of checksums, which are also cyclically and evenly distributed on different disks, allocates a memory equal to the volume of two hard disks. Instead of one, two checksums are calculated, which guarantees data integrity while simultaneously failing two hard drives in the array at once.

Advantages of RAID 6 - a high degree of information security and less than in RAID 5, a drop in performance during data recovery when replacing a damaged disk.

The disadvantage of RAID 6 is the decrease in the overall data exchange rate by about 10% due to an increase in the amount of necessary checksum calculations, as well as due to an increase in the amount of recorded / read information.

RAID Combo Types

In addition to the basic types discussed above, various combinations of them are widely used that compensate for certain disadvantages of simple RAID. In particular, the use of RAID 10 and RAID 0 + 1 schemes is widespread. In the first case, a pair of mirrored arrays are combined in RAID 0, in the second, on the contrary, two RAID 0 are combined in a mirror. In both cases, the increased RAID 0 performance is added to the security of RAID 1 information.

Often, in order to increase the level of protection of important information, RAID 51 or RAID 61 construction schemes are used - mirroring of highly protected arrays in such a way ensures exceptional data safety in case of any failures. However, at home such arrays are not practical to implement due to excessive redundancy.

Building a disk array - from theory to practice

The construction and management of any RAID is done by a specialized RAID controller. To the great relief of the average user of a personal computer, in most modern motherboards these controllers are already implemented at the level of the south bridge of the chipset. So to build an array of hard disks, it’s enough to worry about acquiring the required number of hard disks and determining the desired type of RAID in the corresponding BIOS setup section. After that, in the system, instead of several hard drives, you will see only one, which, if desired, can be divided into partitions and logical drives. Please note that those who still use Windows XP will need to install an additional driver.

External RAID controller with four SATA ports

Note that integrated controllers, as a rule, are able to create RAID 0, RAID 1 and their combinations. To create more complex arrays, you will still need to purchase a separate controller.

And finally, another tip - to create RAID, buy hard drives of the same size, one manufacturer, one model, and preferably from one batch. Then they will be equipped with the same logic sets and the operation of the array of these hard drives will be the most stable.

Do you like the article? Share with friends: