Transcript Slide 1

R.A.I.D.

Everything you always wanted to know about RAID but didn't know enough to ask

Presented to the APCU by John Donahue October 31, 2009

What we will cover

Exactly what is RAID?

How does it work?

When and where would you use it?

Hardware and Software implementations What are some best practices for RAID My recommendations for a workstation Demo – how to configure a RAID – (if I can access my servers over the internet)

What exactly is R.A.I.D.

The most common meaning today is: – Redundant Array of Independent Disks – Originally: Redundant Array of Inexpensive Disks The term was coined in 1987 by Dave Patterson, U of Calif., Berkeley in one of his projects.

Original motivation was to replace large and very expensive mainframe DASD drives by an array of cheaper disk drives – like the type often used in PC’s at that time Now it’s basically connecting a collection of hard drives in an arrangement that enhances reliability, speed, or both.

A word to the wise

It may be obvious, but it’s still worth mentioning.

RAID is NOT a substitute for “backups” It can minimize the immediate impact due to disk failures, but it cannot recover data due to other failures and disasters.

Offsite backups are still the best insurance for data recovery

Most common flavors of RAID

RAID-0 Not truly a RAID (no Redundant drive) – Used for simply striping data across multiple disks – Possible performance gain, but no fault tolerance RAID-1 Simple mirroring, (2 disks) single fault tolerant – Fastest write performance – Highest (50%) overhead in storage RAID-5 Striping w/parity, (3 or more) single fault tolerant – Slower write performance – Lowest overhead in storage (33% or less, w/5 disks only 20%) RAID-6 Striping w/parity, double fault tolerant – More $$$ and used only for exceptionally high reliability RAID-10 (or 1+0) is mirroring plus striping

Less common flavors of RAID

JBOD is not RAID, but sometimes mentioned – Just a Bunch Of Disks strung together to look like a single volume. a.k.a. Spanned volume – No performance gain or reliability protection RAID-2 never caught on (complex, expensive) RAID-3 and 4 Dedicated drive for parity (bottleneck on parity drive- rare) RAID-7 not an industry standard- very rare RAID-01 (or 0+1) and 50, 60. – They are similar in concept to RAID-10 but less common. All are variations of mirroring and striping.

How It Works, cont.

RAID-0 (striping)

Disk 1

Block a Block f Block k

Disk 2 Disk 3

Block b Block g Block l Block c Block h Block m

Disk 4

Block d Block i Block n

Disk 5

Block e Block j Block o

How It Works, cont.

RAID-1 Mirroring

Disk 1

Block a Block b Block c Block d Block e

Disk 2

Block a Block b Block c Block d Block e

How It Works, cont.

RAID 5 Striping with parity

Disk 1

Block a Block e Block i Block m ECC 5

Disk 2 Disk 3

Block b Block f Block j ECC 4 Block q Block c Block g ECC 3 Block n Block r

Disk 4

Block d ECC 2 Block k Block o Block s

Disk 5

ECC 1 Block h Block l Block p Block t

Disk 1

Block a Block e

Disk 2

Block a Block e

How It Works, cont.

RAID-10 or 1+0 (mirroring with striping)

Disk 3

Block b Block f

Disk 4

Block b Block f

Disk 5

Block c Block g

Disk 7

Block d Block h

mirror

Disk 6

Block c Block g

Disk 8

Block d Block h

0 1 5 6

RAID Comparison Table

RAID Min drvs Data prot.

Read perf.

Write perf.

Cap. Avail.

Usage comments 2 2 3 4 no yes yes yes 2 high high 100% High end workstations, very transitory data high Med.

50% high high low low 67% 94% 50% 88% Operating systems, transaction databases Data warehousing, web serving, archiving high avail. Solutions, servers w/large capcty.

10 4 yes high Med.

50% Fast databases, application servers

When and Where would you want to use RAID

In very general terms, let’s consider the impacts of a disk failure.

If a single workstation has a disk failure, you probably have one worker unable to perform their tasks.

If a server has a disk failure, you very likely could have hundreds or many thousands of workers unable to perform their tasks.

Any place where a disk failure would have a very large impact is a candidate for RAID.

When and Where would you want to use RAID cont.

Servers (Web, storage, database, email, application, etc.) High reliability workstations Process control computers Any place where downtime due to disk failures would be unacceptable Where higher disk performance provided by striping is important.

Just because you want to play with it

How Is RAID Implemented

Hardware (preferred) – Motherboard or SCSI, SATA, PATA controller card handles all of the details of spanning and/or fault tolerance – This option is generally better, but more $$$ Software – Operating system handles the details of spanning and/or fault tolerance – This option should be considered if hardware option not available or constrained by $$$

Hardware RAID

Some newer motherboards have RAID support Server controller cards provide RAID ($$$) – Compaq (SCSI and Fiber Channel) Hot-pluggable Hot spare – Hewlett Packard – Adaptec – LSI MegaRAID Workstation SATA controllers (less $$$) – Options and prices change daily. Check out FRY’s for starters. I have had good results with SIIG and heard good things about Highpoint controllers.

Operating Systems That Support a Software RAID

Apple – RAID- 0, 1, 5, 10 FreeBSD – RAID-0, 1, 3, Linux – RAID-0, 1, 4, 5, 6 Microsoft server- RAID-0, 1, 5 Windows XP Professional – RAID-0, JBOD – A patch exists to add RAID-1, 5 Windows 7 Pro – RAID-0, 1 (based on beta) NetBSD – RAID-0, 1, 4, 5, 10 – with software RAIDframe OpenBSD - aims to support RAID-0, 4, 5 – with software SoftRAID OpenSolaris and Solaris 10 – RAID-0, 1, 5, 6

Best Practices for RAID

For servers, you always want RAID and the preferred type of RAID is dependent on the server function.

Web, and many types of app servers – RAID-1 for operating system – RAID-5 for everything else Database and write intensive servers – RAID-1 for operating system – RAID-1 database Note: RAID 6 or 10 is also a possible option if the added reliability is worth the extra $$$.

Best Practices for RAID, cont.

For workstations – RAID-1 for operating system – RAID-1 for high performance on work disks If multiple disks and performance are more important than fault tolerance on work disks, consider striped RAID-0 – RAID-5 for large local storage (not heavy write activity)

My Recommendations for a Workstation

If you have RAID option in your motherboard, use it for your operating system drives If you don’t have and don’t want to buy hardware RAID, consider Windows 7 Pro software RAID-1 Depending on your needs, you might also consider a RAID-0 where disk performance is more important than fault tolerance.

Consider RAID-5 for disk storage where you want fault tolerance but it does not have a high write/change activity For more archival type disk storage, consider a network connected disk storage device that uses RAID-5.

Network Attached RAID

Buffalo LinkStation supports RAID 0, 1, 5, 10 with (4) 500GB drives also gigabit Ethernet connection This 2 terabyte version sold at Fry’s for $400 in the last August flyer

Demo how to configure RAID

Hardware configuration on a Compaq server using the RAID configuration tool.

Software configuration on a Windows Server 2003 Software configuration on a Windows 7 Pro (or Ultimate) workstation

Questions?

How RAID Parity Works

The trick is done via “XOR” (Exclusive Or)’s Drive 1 0110 Drive 2 1010 Drive 3 1001 Drive 4 0101 value 6 value 10 value 9 ECC In this case, Drive 4 happens to contain the parity for the actual data in the stripe that was written onto Drives 1,2,3. If any one of the drives 1, 2, 3 fail, the data that was on that drive can be recovered by doing an XOR on the data from the remaining drives.

In this example let’s say drive 2 fails. Then:  0110 From drive 1 1001 From drive 3 0101 From drive 4 -------- XOR 1010 Recovered drive 2 data

Additional Tidbits

w/software RAID you may have to use a floppy, CD, or USB to boot to the second plex of the system mirror.

Software RAID will force a rebuild if proper shutdown was not used Rebuild time can be considerable – Very rough estimates are 10-15 min. per gigabyte – 1 TB hardware RAID-5 (Buffalo) took several days – 30 GB hardware RAID-1 took over 7 hours (software) – 120 GB software RAID-1 took over 15 hours (hardwr)

RAID 1+0 vs. 0+1

RAID 0+1:

We stripe together drives 1, 2, 3, 4 and 5 into RAID 0 stripe set "A", and drives 6, 7, 8, 9 and 10 into RAID 0 stripe set "B". We then mirror A and B using RAID 1. If one drive fails, say drive #2, then the entire stripe set "A" is lost, because RAID 0 has no redundancy; the RAID 0+1 array continues to chug along because the entire stripe set "B" is still functioning. However, at this point you are reduced to running what is in essence a straight RAID 0 array until drive #2 can be fixed. If in the meantime drive #9 goes down, you lose the entire array.

RAID 1+0:

We mirror drives 1 and 2 to form RAID 1 mirror set "A"; 3 and 4 become "B"; 5 and 6 become "C"; 7 and 8 become "D"; and 9 and 10 become "E". We then do a RAID 0 stripe across sets A through E. If drive #2 fails now, only mirror set "A" is affected; it still has drive #1 so it is fine, and the RAID 1+0 array continues functioning. If while drive #2 is being replaced drive #9 fails, the array is fine, because drive #9 is in a different mirror pair from #2. Only two failures in the same mirror set will cause the array to fail, so in theory, five drives can fail--as long as they are all in different sets--and the array would still be fine.

RAID 1+0 vs. 0+1, cont.

Clearly, RAID 1+0 is more robust than RAID 0+1. Now, if the controller running RAID 0+1 were smart, when drive #2 failed it would continue striping to the other four drives in stripe set "A", and if drive #9 later failed it would "realize" that it could use drive #4 in its stead, since it should have the same data. This functionality would theoretically make RAID 0+1 just as fault-tolerant as RAID 1+0. Unfortunately, most controllers

aren't

that smart. It pays to ask specific questions about how a multiple RAID array implementation handles multiple drive failures, but in general, a controller won't swap drives between component sub-arrays unless the manufacturer of the controller specifically says it will.

The same impact on fault tolerance applies to rebuilding. Consider again the example above. In RAID 0+1, if drive #2 fails, the data on five hard disks will need to be rebuilt, because the whole stripe set "A" will be wiped out. In RAID 1+0, only drive #2 has to be rebuilt. Again here, the advantage is to RAID 1+0.

Backup Slides if Internet Connection is Not Available to My Servers

Windows 2003 Server

Windows 7 beta

Compaq Hardware RAID Configurator Showing Physical Disk View

Compaq Hardware RAID Configurator Showing “Logical Volume” view Notice that Array A, which is really three physical disks, will appear as a single physical disk to the operating system and will contain the C: drive for the Windows operating system. The next slide shows this in the Windows Disk Management.

Windows Disk Management View of Compaq RAID arrays Notice “Disk 2” (drive C) appears as a single physical drive to Windows