The Full "Extent" of Dynamic Provisioning

By | March 10, 2009

Recently Hitachi added Zero Page Reclaim functionality to its Dynamic Provisioning (DP) on the USP V.  This feature is one from a fairly long list of enhancements on most peoples DP wish list.  Other improvements are hopefully in the pipeline. 

In this post I'm going to talk about Zero Page Reclaim as well as a possible interesting shift towards the "extent" as basic unit of the entire array……

Dynamic Provisioning 101 (I'll keep this bit short)

When I talk about dynamic provisioning (DP) I am also referring to Thin Provisioning……….

Under the hood of any DP implementation is a basic construct that I will refer to in this post as the "extent".  Each vendor has a different name for it.  3Par calls it a "chunk" and Hitachi calls it a "page".  Not only do most vendors have their own names for their extents, they all have their own ways of implementing them.  One obvious and much debated difference is the size of each vendors extent. 

This extent, call it what you will, is the basic unit of allocation for all DP implementations that I know of.  Essentially, each time a host writes to a DP (thin) volume, the array will allocate one or more extents.  As this post is not intended to be a tutorial on DP I'll leave the theory there and move on to the meat.

Zero Page Reclaim: How it works

As mentioned, extent size varies between vendor.  Many vendors have an extent size smaller than 1MB, and Hitachi has fixed its extent size at 42MB (more of a "book" or even "collection" than a "page" but such debates are old news now so I won't go into it here).

When an array performs a Zero Page Reclaim operation on a DP volume, it searches the allocated extents for that volume looking for extents that have no data.  Or to be more correct, it searches for extents that contain only zeros.

Sidestep:  Most arrays will zero out a volume when it is RAID formatted, basically writing zeros to all blocks comprising the volume.  This helps the XOR parity calculation.

So any extent that the array finds comprised entirely of zeros it assumes is unused and breaks the association between the extent and the volume.  This has the effect of placing such extents back into the Free Pool.  This has obvious capacity saving benefits and is a useful tool for any thin provisioning array to have up its sleeve.

How useful will this be?

Well, I guess only time will tell, but its certain to have some benefits.

How useful this functionality is relies on a few factors. Two of which include; extent size, and file system behaviour.

Extent size.  It would seem, at first glance, that the smaller your extent size, the more opportunity for reclaiming space.  The theory behind this being that there is more chance of finding extents comprised entirely of zeros if your extent is relatively small.  Conversely you are less likely to find multiple megabytes worth of consecutive zeros, as would be required if you have a larger extent size.  However, it may not turn out to be that simple.  Read on……

File System Behaviour.  One point in relation to file systems is that many file systems do not zero out space when a file is deleted.  Instead they take the lazy approach and simply mark the blocks as free but leave the original data in place. 

Now while this soft delete behaviour may come to our rescue when we want to recover a file we accidentally deleted, it doesn't help us with Zero Page Reclaim.  Put in other words, freeing up 150GB of space on a volume does not normally (depends on your file system behaviour….) write zeros and make the space eligible for Zero Page reclaim.  See diagram below –

 

The above diagram is an oversimplification but illustrates the point.  For Extent 1 to be a candidate for Zero Page Reclaim, the file system would have to hard delete (zero out) all deleted data as shown in the diagram below – 

It is expected, in its current incarnations and combined with present day file systems, that the best use case for zero page reclaim is after migrating volumes from classical thick volumes to new thin dynamic volumes.  This works well when you have, for example, a 500GB traditional thick volume but only 200GB has been written to.  When migrating this to a thin volume you will more than likely be able to reclaim the untouched 300GB at the end of the volume to the Free Pool.

Databases and the likes may work differently, depending on your database/application and OS………  If so, a smaller extent size may yield superior benefits here.

Array Maturity or Adolescence

Storagebod recently postulated the idea that the storage array may have reached "functional maturity" and as a result, the days of major changes in storage array functionality may be behind us, for a while at least (assuming Ive understood Storagebods comments). 

Obviously I have no access to the planet sized brains in the heads of some of the guys that work at the storage vendors, never mind a crystal ball, so I may be way off the mark with this, but I tend to disagree.  Read on……

The Rise of the Extent

Another feature that I want to see added to DP aware arrays is the ability to migrate at the extent level. 

While there is mileage in the idea that pooling all of you RAID groups together and spreading the allocation of extents across all spindles will help avoid hotspots, exact mileage will no doubt vary.  With the ability to monitor pool performance at the extent level and the ability to migrate extents from spindles within and across pools will give great performance and management flexibility. 

It will be interesting to see what implications extent size will have on the viability and performance of such extent based migrations?

Migration at the LUN level is fine, buts its cumbersome and clunky.  The ability to move an extent to cooler spindles within a pool and even from a disk pool to a pool comprised of EFDs (Enterprise Flash Drives) will really add functionality and value.

I'm hoping thinking that SSD/EFD is going to encourage a lot of changes to storage arrays over the next few years.  Today's storage arrays are highly tuned for spinning disk and will no doubt evolve in step with the uptake of SSD/EFD.

Personally I think it's an exciting time to be involved in storage, with lots of great things happening and lots still to be explored.

Nigel

7 thoughts on “The Full "Extent" of Dynamic Provisioning

  1. Storage Pimp

    Nigel:
     
    You have just about hit the nail on the head.  The first storage array with the ability to migrate extents with high random reads to SSD/EFD devices on the fly without user interaction will be distruptive innovation. 
    I think 2-3 years from now 90% of arrays will utilize pooled storage.  I believe pooled storage outperforms logical devices confined to raid ranks in just about every single shared storage situation.  Thank you 3par….
    Sun is already playing with this in their new NAS devices.  Rumors are EMC is playing with the ability in the DMX-5. 
     
     

  2. Nigel

    Then I say "Bring on the DMX-5!!!"

    To be honest Im expecting big things with the DMX-5.  EMC will no doubt have learned a huge amount form their 1 year and counting experience with SSD/EFD (and I think EFD will drive huge changes) but more than ever I think they need to keep innovating to keep ahead and maintain their number 1 spot.

    Its no longer a case of just Hitachi keeping EMC competetive and vice-versa, there are other players who are becoming relevant and I think 3Par seem to be one of them, along with SVC, may be XIV and of course NetApp…………

    Even the big banks are now looking to make cost and energy savings and no doubt more willing to entertain bids from outside of the big 3.  So if the banks are, then everybody else sure as heck will be.

    My parents have often told me that there is no such thing as a "job for life" any more.  I wonder if the days of guaranteed customers, "customers for life" if you will, are coming to an end.

    But this is great, change and competition all fertilise innovation and bringing out the best.  I think its gunna be good

  3. Martin G

    Extent level migration is probably the last significant change that we will see in the array.I think there is one more big array revision ahead of us and I’m expecting this revision in the next twelve months.

  4. Nigel

    Martin, the only issue I see with that being certain companies, EMC definitely being one of them, cannot let things settle like that without seeing their market share being eaten up by cheaper competitors.  EMC and others at the top of the tree need to keep innovating and moving things forward.Unlss they just buy up the likes of 3Par and anybody else who comes along 😉

  5. Pingback: The Green Machine, Asim Zaheer, Hitachi Data Systems » Blog Archive » Wasted Space

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You can add images to your comment by clicking here.