XIV Deep Dive Discussion

By | March 19, 2010

DeepDivePodcast XIV Logo

 

Nobody ever seems to talk about IBM storage, its just boring right?

The above comment was arguably true, until IBM acquired the XIV Nextra product just over two years ago.  This acquisition seems to have propelled the IBM storage portfolio back in to the mainstream. People are actually talking about an IBM storage product!

In fact its probably fair to say that XIV is one of the most intruiging, most misunderstood and most talked about storage platforms on the market.  There's certainly not shortage of claims being made from both sides of the fence – from IBM, and from the competition.  As a result its hard to know who to believe.

In order to try and shed some light, I pulled together some of the most respected independent voices I could find, and got us all together on a call to openly discuss the pro’s and con’s of XIV.  The result is 50 minutes of XIV technical talk – definitely one for your MP3 player.

People joining me on this edition of the show –

Alan Senior – XIV expert at a VAR
Matt Davis – www.techmute.com
Stephen Foskett – blog.fosketts.net

 

NOTE: Apologies up front for the poor sound quality when Alan speaks – he was a late addition to the show and didnt have a good skype headset.  But his experience and contribution is hopefully worth it.

Enjoy the show – thoughts and comments welcome!

6 thoughts on “XIV Deep Dive Discussion

  1. SRJ

    Nice work guys…enjoyed the podcast.  I was impressed by the fairness and lack of FUD.  Techmute did a great job of summarizing our Google Wave on the subject…
    In response to the question – "What are the pro's of XIV?" – in addition to what was said, I would have just added a bit about the snapshot capability.  Their implementation is second-to-none in the industry and they do a very poor job of marketing this fact.  Like NetApp, there is no performance hit, but the XIV is not limited to 255 per volume.  Also, with XIV you can restore an older snapshot without destroying all of the subsequent snapshots, which is pretty cool.

  2. Nigel Poulton Post author

    Hi SRJ,

    Thanks for your comment and Im gald we came across as balanced and lacking FUD.

    Id told the guys on the podcast that I expected it to run for ~30 minutes. Turned out to be about 50minutes so we didnt have time to mention everything. One other thing from the top of my head – Ive been told that you can grow volumes that are participating in live remote replication…. If you grow the source volume then remote volume is automatically grown! Sounds like a great feature to me.

    Id be interested to know what else you like about XIV.

    Nigel

  3. Hector Servadac

    Nigel… any chance you can transcript this? It's very difficult to hear Alan…

  4. IvanE

    Not trying to be mean but couple of things I wanted to note on abviously most discussed part – double failures:
    – Nigel, you're not missing anything 🙂 Highly utilized box under load will take over 30 minutes to rebuild after hard failure. IBM's own performance papers show exactly that.
    – There was a mention about contacting lab in case of double failure. Nice idea, but… XIV does not stop I/O in case of double failure. It's silent. So your applications will keep running and updating data until they actually hit lost area. In most cases this will lead to incosistency rendering your data useless.
    – There was a discussion about risk window. What always being overlooked is that you don't need two drives to fail – it's enough to have an independent failure of a module and one drive within 4-6 hours timeframe. You can't bring the module back until IBM's engineer comes around. Rebuild of module takes those 4-6 hours. An the box keeps running even though data is lost (see previous comment).
    – When talking about rebuild times in XIV as compared to traditional RAID, IBM never mentions that with RAID you risk one disk group, with XIV – the whole array. So, with XIV the probability might be less (due to faster rebuilds) but the impact in most cases is much, much higher.
    These are the reasons I voted "No, I don't like XIV, yes, I have tried it".

  5. IvanE

    Not trying to be mean but couple of things I wanted to note on abviously most discussed part – double failures:

    – Nigel, you’re not missing anything 🙂 Highly utilized box under load will take over 30 minutes to rebuild after hard failure. IBM’s own performance papers show exactly that.

    – There was a mention about contacting lab in case of double failure. Nice idea, but… XIV does not stop I/O in case of double failure. It’s silent. So your applications will keep running and updating data until they actually hit lost area. In most cases this will lead to incosistency rendering your data useless.

    – There was a discussion about risk window. What always being overlooked is that you don’t need two drives to fail – it’s enough to have an independent failure of a module and one drive within 4-6 hours timeframe. You can’t bring the module back until IBM’s engineer comes around. Rebuild of module takes those 4-6 hours. An the box keeps running even though data is lost (see previous comment).

    – When talking about rebuild times in XIV as compared to traditional RAID, IBM never mentions that with RAID you risk one disk group, with XIV – the whole array. So, with XIV the probability might be less (due to faster rebuilds) but the impact in most cases is much, much higher.

    These are the reasons I voted “No, I don’t like XIV, yes, I have tried it”.

  6. Pingback: #EMCworld 2010 » XIV Recap

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You can add images to your comment by clicking here.