dbTalk Databases Forums  

Is RAID-5 Still Evil?

comp.databases.informix comp.databases.informix


Discuss Is RAID-5 Still Evil? in the comp.databases.informix forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
red_valsen
 
Posts: n/a

Default Is RAID-5 Still Evil? - 07-25-2011 , 08:07 AM






Most of us are familiar with the problems inherent in the RAID-5 data
distribution scheme. But has the technology improved in the 8 years
since BAARF was conceived, and Art Kagel penned his superb polemic
condemning RAID? Has the industry addressed the problems associated
with partial media failure while using RAID?

Reply With Quote
  #2  
Old   
Art Kagel
 
Posts: n/a

Default Re: Is RAID-5 Still Evil? - 07-25-2011 , 09:33 AM






No. Just come up with even lounier ideas line RAID6, RAIDZ, RAID61!

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Advanced DataTools, the IIUG, nor any other
organization with which I am associated either explicitly, implicitly, or by
inference. Neither do those opinions reflect those of other individuals
affiliated with any entity with which I am affiliated nor those of the
entities themselves.



On Mon, Jul 25, 2011 at 9:07 AM, red_valsen <red_valsen (AT) yahoo (DOT) com> wrote:

Quote:
Most of us are familiar with the problems inherent in the RAID-5 data
distribution scheme. But has the technology improved in the 8 years
since BAARF was conceived, and Art Kagel penned his superb polemic
condemning RAID? Has the industry addressed the problems associated
with partial media failure while using RAID?
_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list

Reply With Quote
  #3  
Old   
Clive Eisen
 
Posts: n/a

Default Re: Is RAID-5 Still Evil? - 07-25-2011 , 09:38 AM



On 25/07/2011 14:07, red_valsen wrote:
Quote:
Most of us are familiar with the problems inherent in the RAID-5 data
distribution scheme. But has the technology improved in the 8 years
since BAARF was conceived, and Art Kagel penned his superb polemic
condemning RAID? Has the industry addressed the problems associated
with partial media failure while using RAID?
Technology has, of course, improved - definitions have not changed

RAID-5 is still (in effect) 1 parity disk for n data disks.

So

Just Say No

Reply With Quote
  #4  
Old   
Neil Truby
 
Posts: n/a

Default Re: Is RAID-5 Still Evil? - 07-25-2011 , 05:56 PM



Quote:
"Art Kagel" <art.kagel (AT) gmail (DOT) com> wrote in message
news:mailman.1024.1311604448.1071.informix-list (AT) iiug (DOT) org...
No. Just come up with even lounier ideas line RAID6, RAIDZ, RAID61!
Why don;t you re-state the fundamental objections for us?

Reply With Quote
  #5  
Old   
Art Kagel
 
Posts: n/a

Default Re: Is RAID-5 Still Evil? - 07-25-2011 , 06:14 PM



Haven't posted it here in a long while, so you asked for it, you got it.
Without further introduction, my RAID5 Rant:

*RAID5 versus RAID10 (or even RAID3 or RAID4)*

First let's get on the same page so we're all talking about apples.

*What is RAID5?*
OK here is the deal, RAID5 uses ONLY ONE parity drive per stripe. Many RAID5
arrays are assembled from five drives (4 data and 1 parity though it is not
a single
drive that is holding all of the parity as in RAID 3 & 4 but read on). If
your drive
counts are different adjust the calculations appropriately. If you have 10
drives or
say 20GB each for 200GB RAID5 will use 20% for parity (assuming you set it
up as
two 5 drive arrays) so you will have 160GB of storage. Now since RAID10,
like
mirroring (RAID1), uses 1 (or more) mirror drive for each primary drive you
are
using 50% for redundancy so to get the same 160GB of storage you will need 8
pairs or 16 - 20GB drives, which is why RAID5 is so popular. This intro is
just to
put things into 1perspective.

RAID5 is physically a stripe set like RAID0 but with failed drive recovery
included.
RAID5 reserves one disk block out of each stripe block for parity data. The
parity
block contains an error correction code which can correct any error in the
RAID5
block, in effect it is used in combination with the remaining data blocks to
recreate any single missing block, gone missing because a drive has failed.
The
innovation of RAID5 over RAID3 & RAID4 is that the parity is distributed on
a
round robin basis so that there can be independent reading of different
blocks
from the several drives. This is why RAID5 became more popular than RAID3 &
RAID4 which must sychronously read the same block from all drives together.
So,
if Drive2 in a RAID5 array fails blocks 1,2,4,5,6 & 7 are data blocks on
this drive
and blocks 3 and 8 are parity blocks on this drive. That means that the
parity on
Drive5 will be used to recreate the data block from Disk2 if block 1 is
requested
before a new drive replaces Drive2 or during the rebuilding of the new
Drive2
replacement. Likewise the parity on Drive1 will be used to repair block 2
and the
parity on Drive3 will repair block4, etc. For block 2 all the data is safely
on the
remaining drives but during the rebuilding of Drive2's replacement a new
parity
block will be calculated from the block 2 data and will be written to Drive
2.
Now when a disk block is read from the array the RAID software/firmware
calculates which RAID block contains the disk block, which drive the disk
block is
on and which drive contains the parity block for that RAID block and reads
ONLY
the one data drive. It returns the data block. If you later modify the data
block it
recalculates the parity by subtracting the old block and adding in the new
version
then in two separate operations it writes the data block followed by the new
parity
block. To do this it must first read the parity block from whichever drive
contains
the parity for that stripe block and reread the unmodified data for the
updated
block from the original drive. This read-read-write-write is known as the
RAID5
write penalty since these two writes are sequential and synchronous the
write
system call cannot return until the reread and both writes complete, for
safety, so
writing to RAID5 is up to 50% slower than RAID0 for an array of the same
capacity. (Some software RAID5's avoid the re-read by keeping an unmodified
copy of the original block in memory. In some application spaces this can
require
very large memory caches on the array which is usually far more expensive
than
mechanical drives.)

*Now what is RAID10:*

RAID10 is one of the combinations of RAID1 (mirroring) and RAID0 (striping)
which
are possible. There used to be confusion about what RAID01 or RAID10 meant
and different RAID vendors defined them differently. Several years ago I
proposed the following standard language which seems to have taken hold.
When
N mirrored pairs are striped together this is called RAID10 or RAID 1+0
because
the mirroring (RAID1) is applied before striping (RAID0). The other option
is to
create two stripe sets and mirror them one to the other, this is known as
RAID01 or RAID 0+1
(because the RAID0 is applied first). In either a RAID01 or RAID10 system
each
and every disk block is completely duplicated on its drive's mirror.
Performancewise
both RAID01 and RAID10 are functionally equivalent. The difference comes
in during recovery where RAID01 suffers from some of the same problems I
will
describe affecting RAID5 while RAID10 does not.

Now if a drive in the RAID5 array fails, is removed, or is shut off data is
returned
by reading the blocks from the remaining drives in the array and calculating
the
missing data using the parity, assuming the defunct drive is not the parity
block
drive for that RAID block. Note that it takes 4 physical reads to replace
the
missing disk block (for a 5 drive array) for four out of every five disk
blocks
leading to a 64% performance degradation until the problem is discovered and
a
new drive can be mapped in to begin recovery. Performance is degraded
further
during recovery because all drives are being actively accessed in order to
rebuild
the replacement drive which will cause head contention (see below).
If a drive in the RAID10 array fails data is returned from its mirror drive
in a single
read with only minor (6.25% on average for a 4 pair array as a whole)
performance reduction when two non-contiguous blocks are needed from the
damaged pair (since the two blocks cannot be read in parallel from both
drives)
and none otherwise.

One begins to get an inkling of what is going on and why I dislike RAID5,
but, as
they say on late night info-mercials, wait, there's more!

*What's wrong besides a bit of performance that I didn't know I was missing?
*

OK, so that brings us to the final question which is: What is the REAL
problem
with RAID5? It does recover a failed drive right? So writes are slower, I
don't do
enough writing to worry about it and the cache helps a lot also, I've got
LOTS of
cache! (Do you catch the logical and financial fallacy of that last piece?)

The problem is that despite the improved reliability of modern drives and
the
improved error correction codes on most drives, and even despite the
additional 8
bytes of error correction that EMC and some other vendors put on their more
advanced disk subsystems (if you are lucky enough to use one of these), it
is
more than a little possible that a drive will become flaky and begin to
return
garbage. This is known as partial media failure. Up to a point the drive
itself will
correct single bit errors. Now SCSI drives reserve several hundred disk
blocks to
be remapped to replace fading sectors with unused ones when read errors
cannot
be corrected by rewriting the sector. However, if the drive is going these
will not
last very long and will run out and SCSI does NOT report correctable errors
back
to the OS! Therefore you will not know the drive is becoming unstable until
it is
too late and there are no more replacement sectors and the drive begins to
return
garbage. [Note that the recently popular IDE/ATA/SATA drives do not (TMK)
include bad sector remapping in their hardware so garbage is returned that
much
sooner.] When a drive returns garbage, since RAID5 does not EVER check
parity
on read (RAID3 & RAID4 do BTW and both perform better for databases than
RAID5 to boot) when you write the garbage sector back garbage parity will be
calculated and your RAID5 integrity is lost! Similarly if a drive fails and
one of the
remaining drives is flaky the replacement will be rebuilt with garbage also
propagating the problem to two blocks instead of just one.

Need more? During recovery, read performance for a RAID5 array is degraded
by
as much as 80%. Some advanced arrays let you configure the preference more
toward recovery or toward performance. However, skewing priority towards
performance during recover will increase recovery time and increase the
likelihood of losing a second drive in the array before recovery completes.
Losing
a second drive in a RAID5 array will result in catastrophic data loss.
RAID10 on
the other hand will only be recovering one drive out of 4 or more pairs with
performance ONLY of reads from the recovering pair degraded making the
performance hit to the array overall only about 20%! Plus there is no parity
calculation time used during recovery - it's a straight data copy so
recovery time
is much shorter.

What was that about losing a second drive? Well, with RAID10 there is no
danger
unless the one mirror that is recovering also fails and that's 80% or more
less
likely to happen than that any other drive in a RAID5 array will fail! And
since
most multiple drive failures are caused by undetected manufacturing defects
you
can make even this possibility vanishingly small by making sure to mirror
every
drive with one from a different manufacturer's lot number.

I can hear you say, "This scenario does not seem likely!" Unfortunately it
is all too
likely. It happened to me and I have heard from several other DBAs and SAs
who
have similar experiences. My former employer lost 50 drives over two weeks
when a batch of 200 IBM OEM drives began to fail. IBM discovered that the
single
lot of drives would have their spindle bearings freeze after so many hours
of
operation. Fortunately due in part to RAID10 on our IDS chunks and in part
to a
herculean effort by DG techs and our own people over 2 weekends no data was
irretrievably lost. HOWEVER, one RAID5 filesystem was a total loss after a
second
drive failed during recovery. Fortunately everything was backed up on tape,
but
that filesystem was down for several hours causing 1500 developers to
twiddle
their thumbs for most of a day. That one internal service outage of only a
few
hours cost more in lost productivity than the extra cost of using RAID10 for
all of
those filesystems arrays!

Conclusion? For safety and performance favor RAID10 first, RAID3 second,
RAID4
third, and RAID5 last! The original reason for the RAID2-RAID5
specifications was
that the high cost of disks was making RAID1, mirroring, impractical for
many
organizations. That is no longer the case! Drives are commodity priced, even
the
biggest fastest drives are cheaper in absolute inflated dollars or adjusted
dollars
than drives were then and cost per MB is a tiny fraction of what it was.
Does
RAID5 make ANY sense anymore? Obviously I think not.

To put things into perspective: If a drive costs $1000US (and most are far
less
expensive than that) then switching from a 4 pair RAID10 array to a 5 drive
RAID5
array will save 3 drives or less than $3000US. What is the cost of overtime,
wear
and tear on the technicians, DBAs, managers, and customers of even a
recovery
scare? What is the cost of reduced performance and possibly reduced customer
satisfaction and confidence? Finally what is the cost of lost business if
data is
unrecoverable? I maintain that the drives are FAR cheaper! Hence my mantra:
*
NO RAID5! NO RAID5! NO RAID5! NO RAID5! NO RAID5! NO RAID5!*

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Advanced DataTools, the IIUG, nor any other
organization with which I am associated either explicitly, implicitly, or by
inference. Neither do those opinions reflect those of other individuals
affiliated with any entity with which I am affiliated nor those of the
entities themselves.



On Mon, Jul 25, 2011 at 6:56 PM, Neil Truby <neil.truby (AT) ardenta (DOT) com> wrote:

Quote:
"Art Kagel" <art.kagel (AT) gmail (DOT) com> wrote in message
news:mailman.1024.1311604448.1071.informix-list (AT) iiug (DOT) org...
No. Just come up with even lounier ideas line RAID6, RAIDZ, RAID61!

Why don;t you re-state the fundamental objections for us?


_______________________________________________
Informix-list mailing list
Informix-list (AT) iiug (DOT) org
http://www.iiug.org/mailman/listinfo/informix-list

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.