Who let the cow in?

By | July 3, 2007

Up until about 6 months ago I was aware that the Hitachi range of enterprise storage had Copy On Write (CoW) snapshot functionality.  But I had never seen, or heard of it being used anywhere….. that's right, anywhere!  In fact any time I talked about it with my peers, we ending up authoritatively agreeing that it was not a technology suited to enterprise storage.

Add to that a training course I attended, where I wanted to play around with it (CoW) on a USP after class had finished.  Only to find; myself, the course trainer and the lab manager fumbling through the GUI trying to find the button to create our first CoW pair – we later found out from the manual that pairs could only be done from the CLI! 

So, in my admittedly small world, as little as may be 6 months ago, nobody used it or even knew how to.

Remember that Im talking about enterprise storage here.  I'd used CoW technology a lot on midrange kit like the EVA, just never on a honkin enterprise array.

Fast forward to now, where I've seen two back to back USP installations using CoW and consider myself a world authority on the technology 😉

So what has changed?  Was I wrong 6 months ago, is CoW snapshot technology suited for enterprise storage?

If I take a look at the installations Ive recently worked on, they both still use full block copies, using ShadowIamge, to back-up their critical applications such as Oracle and Exchange.  One is using CoW for backing up less critical systems, and the other is using it as a space efficient method for keeping multiple daily file server backups (with a 1 hour granularity).

So with this in mind it kind of looks less like CoW is creeping into the enterprise space, and more like enterprise storage is creeping down into the space of midrange storage.

Interestingly, both CoW implementations have been on small USPs.  One is a single cabinet (DKC only) and the other is a 2 cabinet installation and both have external storage hung off the back.  And in my experience, this trend of buying smaller USPs, instead of traditional midrange kit, is becoming more and more common, and I suppose it makes a lot of sense.  For example, it provides all of the following –

1. Fits today needs, with enterprise class features for a moderate price
2. Provides wide range of functionality and therefore flexibility (midrange and enterprise features)
3. Allows for large internal expansion
4. Ability to prolong the life of older storage by hanging it off the back, if you should so desire
5. Allows for future expansion via external storage

Compare this to purchasing a midrange box.  Although a midrange box would normally still be cheaper, it would provide may be only one, in fact only half of one of the above features (1)

Add to this the diskless NSC from Hitachi.  Providing the functionality of enterprise AND midrange (CoW 😉 ) in the NSC controller and allowing you to hang any Tom, Dick or Harry's disks off the back.

So where does midrange storage fit in?  3PAR?

Nigel 

NB.  I see that Anil is doing the 3PAR thing this month <link here>  

9 thoughts on “Who let the cow in?

  1. stephen2615

    Nigel,
     
    Interestingly enough, I have used COW for a very large database on a USP 1100 just because it was so large.  SI was used for a consistent daily backup because it took so long to backup to tape but COW was used to offer smaller time window "snapshots".  Mind you I never actually used it to do any sort of recovery but it was there and made management feel all warm and fuzzy.
     
    I must admit the training on it was less than extensive.   The student guide on COW was not too bad but we had no lab doco.  It was given as an after thought at the end of the last day because I asked for it.  I had to write down some steps in the back of my notebook.  Let me give what I was given.
     
    Identify LDEV’s for pool disk
     
    Create pool
     
    Identify  PVOL for COW
     
    Create V-VOL (max 64 for each P-VOL)
     
    Use SI Interface – > Select P-VOL and assign V-VOL as secondary volume
     
    Manage from SN or CCI.
     
    Perhaps COW normally falls in the same category as Serverless Backup which is too complex to get to work..  It surely would be a wonderful thing if it did work.
     
    Stephen
     
     

  2. the storage anarchist

    FWIW – Symmetrix DMX has supported so-called CoW replicas (we call them space-saving Snapshots) with TimeFinder/Snap since 2004 (Enginuity 5670+). Snaps can be configured via Solutions Enabler, CLI, both the ControlCenter Symmetrix Manager and Symmetrix Management Console GUIs, and even SMI-S (I think).
     
    A significant percentage of our enterprise customers use these for a variety of applications, from backup/recovery points to test & dev against "live" data.

  3. Chris M Evans

    For both the EMC and HDS versions I’d like to see an idea of the performance degredation from using the technology.  Clearly for writes the penalty is on saving the "about to be changed" block and for reads the risk is that the COW and original are both required to be read and potentially with different profiles.  Up to now I’ve never recommended COW due to the uncertainty on performance. 

  4. the storage anarchist

    With the latest code release, DMX minimizes the write performance impact of the Copy on Write overhead by scheduling the copy to occur asynchronously to the write that initiates the copy – an approach that leverages the large global cache architecture and that wouldn’t work as well with most mid-tier platforms. (a technique called ACoFW – Asynchronous Copy on First Write…)
     
    Read performance isn’t really impacted all that much with CoW Snaps, especially if the pre-fetch algorithms are appropriately tuned (as we’ve done for DMX). For sequential reads, it doesn’t really matter where the "next" track is actually stored – the trick is to get the read queued to the drive before the read comes from the host. And random I/O is virtually unaffected – another feature of the DMX pre-fetch algorithms.
     
    BTW – The real limitation of CoW replicas isn’t so much performance. Instead, you have to balance the capacity savings you get for multiple replicas against the availability risks of having multiple LUNs dependent upon the same physical devices for all the data that has NOT changed…a double-drive failure in the primary RAID 5 group(s)  could not only cost you your primary logical device, but ALL of your recovery points as well. RAID 6 is a reasonable insurance policy against that risk.
     
    Note that both wide-striped Thin Provisioning and DeDup technologies create similar "hyper-dependencies" of a large number of logical devices – here too one must balance the risks of a dual drive failure within a single RAID group…if it should ever happen, you’ve lost a portion of virtually every single logical device in your system. For CoW Snapshots, the high probability risk of data loss might be acceptable; the risks may not be justifiable if you’re talking about the primary copy of your LUNs in a thin provisioned deployment..

  5. Nigel (mackem)

    Ive also shared the same hesitance over the availability, or should I say lack of availability, of the snapshots if the primary/source volume fails.  Although on enterprise kit the chances of the primary volume failing due to the storage subsystem is unlikely, I certainly would not like to have to tell a customer that they had also lost all of their snapshots!!  That said, though Im starting to warm to it for certain purposes.

    As for the ACoFW that Barry mentions on the DMX (let me rush to the patent office to copyright that catchy acronym) I’m not actually sure how the USP handles these writes.  I would image they are also handled asynchronously although hand on heart I can’t say for certain – so much for me being a world authority on CoW 😉

    And on the point of serverless backup.  That’s another one that I have never seen done.  I know Snig was looking into this a while ago but I never asked if he got it going?

  6. Anil Gupta

    Nigel,

    I don't think there is a debate between enterprise or midrange any longer. More and more data is being stored on midrange class and more and more performance is being extracted from enterprise class. Also, the trend you are seeing with smaller "enterprise class" being selected for larger "midrange class" may have to do more with requirement of the performance vs. storing data.

    IMO, the debate has moved on to scale-up vs. scale-out in storage. With my interest in 3PAR, Isilon, Amazon S3 and Google storage, you know where my bias lies. 😉 Did you hear about IBM bricks will be productize using a separate company?

    As for availability there is no magic bullet or one solution for all, all options whether snapshot vs. split-mirror or fast cows vs fast and slow cows have their own pros and cons.

    Anil

  7. Nigel (mackem)

    Anil,

    I did see an article on The Register re the IBM bricks.  My gut feeling on that is that its a good thing (as long as enough cash is there to fund it going forward) as Im not a fan of IBM storage.  They dont seem to put the effort in – the DSrange being a pair of AIX servers and some disk shelves and much of the rest of the range being rebadged Enginio and NetApp……

    Nigel

  8. billy bathgates

    Not sure why CoW would be seen as intrinsically non-suitable for the enterprise. Shouldn't a CoW volume be just as reliable as the parent volume, assuming they share the same raid-level  (most of my experience is with EVA snapshots, which are CoW, but can have less redundancy than the parent volume if desired) ?

    Nigel, it's only the high-end DS (8300 I think is the model) which has the p570 aix-based controllers.  I don't think the baby-shark (ds6800) is engenio based, that's the ds4xxx series.

  9. Nigel (mackem)

    Billy,

    I think the main point about suitability for enterprise customers was the fact that the snapshot is reliant on the primary volume.  If the primary volume fails then so do all snapshots.  And the Hitachi boxes allow for up to 64 snaps per primary volume (although that is a marketing number) people often have between 10 and 20 or so.  Whereas a full block level copy, like a snapclone on the EVA, is totally independent.  This is more suited to the reliability requirements of enterprise systems.

    Im also not a big fan of using snapshots for backups unless a lot of data has been updated to the snapshot.  Otherwise you are still placing a heavy read load on your primary volume.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You can add images to your comment by clicking here.