RAID – Time for a better way?

By | October 20, 2006

I see for myself, and hear from colleagues all the time about increasingly long RAID rebuild times, especially on the larger and slower SATA disks that we see more and more today. Im also starting to see and hear about some companies who are using these larger SATA disks for their tier 1 storage and therefore running some of their business critical applications on them. Should we believe the scare mongering about how risky this is?? I have some thoughts on that but I’ll save them for another post. What Ive been thinking about recently, and now as I write this post, is how long will it be before we see the end of RAID technology or at least RAID as we currently know and love (or hate).

I was on the train from Leeds to London earlier in the week travelling to Storage Expo in London and thought Id try and use the quiet travel time to dream up some wonderful replacement for traditional RAID. Unfortunately I fell asleep and arrived in London no closer to solving the problem.

A little while after arriving I was attending a keynote presentation titled “Standing on the shoulders of giants: the future of storage technoloy” when a question from another member of the audience stirred my memory…… I digress slightly for those who weren’t in attendance………..the seminar had a 6 person panel comprised of a single representative from EMC, Pillar, IBM, HP, NetApp and HDS to whom questions could be posed…….. anyway I thought "Magic!! who better to ask than these giants of the industry".  So I took the opportunity and asked what there thoughts were on the topic and what they are doing to overcome the problem.

I was actually quite pleased at how interested the panel seemed to be.  Hu Yoshida from HDS took up the question and explained about RAID 6 (dual parity) and deferred rebuilds. He also mentioned how old RAID technology is and that a replacement may be needed but said that he didn’t know what that replacement technology might be (don’t suppose he would have told me even if he did/does know). The guys from NetApp and Pillar Data Systems also made similar comments about RAID double parity and techniques for reducing the impact of rebuilds on other LUN and components etc….. all ways to help relieve the pressure a little, but nobody offered what I would consider a real solution.

So I guess what Im saying is that these so called “giants” seemed to agree that a replacement for RAID might be needed…… blah blah blah…. but a real solution does not appear to be known. So, if thats not an opportunity then I dont know what is! 

Therefore, if anyone has any good ideas, please feel free to contact me, share the idea and join with me in forming a startup company that patents and sells the new technology and then gets bought by one of the big fish for an exorbitant amount of cash Wink

Mackem

Authors note: I'm sure some of these people are giants of the industry, as are their respective companies.  I do not use the term in derogatory way at all.  However, I cant help but think that when I think of giants I dont think of innovation, instead I think of muscle and loud voices etc, as well as of course bean stalks Wink.  As one other member of the audience mentioned – it always seems to be the smaller or startup companies that have the good ideas (dint go down well with Steve Legg from IBM).

7 thoughts on “RAID – Time for a better way?

  1. Hubert Yoshida

    Hello Snig, I didn’t realize it was you that asked the question. As you can see I am a fan of your blog. I agree that RAID with larger disks is becoming a problem. Users often get hit twice, once for the rebuild to spare and again when the disk is replaced and the data is copied back to the origional raid group. Besides using RAID 6 to defer the rebuild activity, we can leave the data on the spare and reorganize it later. If you use our virtualization, the rebuild is off loaded to the external storage where the failure occured so it is isolated from affecting the other users.
    Aside from using the most reliable disks we can find, this is the best we can do with RAID today.

  2. c2olen

    Mr Yoshida,
    I believe it was Mackem that wrote the post, not Snig.

    Although we do not use HDS storage (yet 😉 ) i can relate to the RAID5 rebuild durations. We experience those on a regular base on our SATA disk pools, wich we use for backup purposes only. Now we have started using 500GB drives, the rebuilds really are taking a very long time. I believe it takes over 8 hours by now on a loaded system. And indeed, the copy-back takes equally long.
    As disk-production companies are challenging each other to increase the disk capacity year after year, there is no light at the end of the RAID tunnel, as far as i am concerned.

    At home on the other hand, i tend to use no larger than 120GB disks (RAID1), because i really do not have the need for more. What i see happening around me though, is that more and more people are storing large amounts of data in their personal computers or external storage media. Backup still doesnt come to their minds, not realizing what happens to their home video’s and digital stills when a failure strikes their harddrive.
    People tend to think it’s very cool to have extremely large harddisks, which they really don’t need.

    Hey Mackem,
    I will contact you soon to discuss the terms of our cooperation. I have the best idea on how to replace RAID. Duh, i wish……….

  3. mackem

    Hi gents,

    Hu, thanks for taking up my question at Storage Expo. I enjoyed the show but was a little disappointed at the “Standing on the shoulders of giants” keynote – there was no where near enough time to quiz you guys on the future of storage technologies. Ive also just read your blog on the RAID topic – very interesting.

    c2olen, I look forward to working with you on our new startup 😉
    PS. Enjoying your posts!

  4. snig

    Without a better solution available today, I can vouch for RAID6. It works well and reduces the risk factor of the disk rebuilds. We just simply don’t worry about the rebuilds anymore.

    I’ve got some ideas for you guys as well. Let me get finished with this replication rollout and we can start melding ideas.

  5. mackem

    I was surprised to read that EMC has left out RAID 6 support in its recent product launch/refresh. They stated “..claiming it still comes at the expense of performance and capacity utilization”. Full article link below –

    http://www.byteandswitch.com/document.asp?doc_id=108055&f_src=byteandswitch_default

    I personally hate it when computers tell you what is good for you – no offense but I’ll be the judge of what is good for me… if I want to do RAID 6 then I will do it, if I dont then I wont. EMC are not even giving the choice.

    Its not pick on EMC time or anything like that, I know they are very good in other areas where they allow you to tune things how you want them etc. However, Im annoyed that a major player like EMC has not noticed the market demand.

    They must not have their ears to the ground. Decision makers must be spending too much time in board meetings and not mixing with the people….. they must not be reading blogs.rupturedmonkey.com !!!!!!! Honestly though blogspots like this are a great way to find out what your customers want.

  6. SanGod

    I know that EMC Clariion has a truly “Dynamic” spare, where the spare becomes the replacement and the new disk becomes the spare. The downside to that of course is that on the Clariion, back-end placement can be critical for performance reasons.

    I agree with the sparing, when a 300G drive fails and has a spare invoked, it can take some time to rebuild….to the point that the customer engineer sometimes shows up with the replacement drive long before the rebuild is complete. Then of course when the drive is replaced it has to rebuild back to the original drive anyway.

    I don’t have hot-spares configured in my DMX, and in fact I know a number of engineers at EMC who build configurations like this, I don’t feel like there is a return when for the period of time the spare is rebuilding redundancy is lost anyway. I figure it’s better to have the production drive back in play as soon as possible and avoid the added cycles doing multiple rebuilds.

    However, that being said, if Raid-6 becomes available I probably won’t move in that direction, it’s just another excuse for disk manufacturers to sell more disks.

  7. Rob

    Funny how 7 years later this is still an issue.
    Often I get to talk to customers who are wondering if something is wrong, since their array is still rebuilding after a disk replacement.
    Of course we’ve got some little tricks to speed up the process (higher the rebuild priority, make more array cache memory available for writing), but even with medium sized (346GB seems to be standard nowadays) SAS drives we now tell our customers it can easily take up to a day (on a server that is on production).

    Only thing that really seems to help cutting down the rebuild time is the fact that nowadays hard drives in predictive failure can already be taken over by the spare and the replacement drive will become the new spare (saving one rebuild).

    Cheers,

    Rob

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You can add images to your comment by clicking here.