VMAX Comes of Age

EMC surprised me last week.  On the15th December they released, what in my opinion, is the biggest uplift in functionality to their flagship Symmetrix product line in a very long time!  The thing that surprised me was that after nearly two years of talking it up, they sneaked it out with barely a sound from Hopkinton.  Not what I was expecting after 2+ years of development and nearly the same in ranting about it.

Still who am I to know.  So lets take a look….

DISCLAIMER:  Opinions expressed and statements made in this article and elsewhere on this web site are my own personal opinions and do not represent those of any of my employers, past, present and future.  I do not guarantee the accuracy of anything said in this article.  I do not work for EMC and I am not an authority on VMAX, not an official one at least :-P

The following list is what I consider the major features of this hugely important release (we’ll go in to more detail on some of them later in the article)

  1. FAST VP/Sub-LUN auto-tiering (we’ll deep dive this)
  2. VLUN Mobility for VP
  3. VMware VAAI support
  4. Federated Live Migration
  5. The usual performance uplifts

Enginuity 5875

Let’s just take a couple of quick seconds to cover the basics of the matter at hand. 

The crux of the updates that EMC released are contained within the 5875 code, or more properly, Enginuity 5875.  For those who may not know, Enginuity trademarked brand name for the microcode that has made the Symmetrix family of products tick for the last 20 or so years.  Enginuity is so fundamental to Symmetrix that you could call it the DNA of the Symmetrix family, and what makes VMAX a Symmetrix.

Microcode is always important to storage arrays, but with the current EMC ethos being that hardware is essentially commodity and that real value and differentiation are driven from software, Enginuity takes an even more important role in the life of VMAX.

All of the 5 points I listed above are features of Enginuity 5875.

So now that we’ve got the background covered lets get into the good stuff….. 

FAST VP Deep Dive

Let’s cut right to the chase and get under the hood of FAST VP…..

There is no doubt that of the 50+ features and enhancements comprising the Enguinity 5875 release, FAST VP is the daddy!  Much of the industry, including EMC themselves, have been extolling the virtues of sub-LUN tiering for what feels like the last 50 years. 

However, despite waxing lyrical over its potential and investing so heavily in its development, EMC are the last of the so called big three to come to market with the functionality.  IBM were first to market with the functionality in the DS8700 product, and more recently Hitachi came to market with the functionality in their VSP product.  However, as we should all know, it’s not always about being first to market.

Anyway…… Let’s start with some basics but move quickly to the interesting stuff.

First up, FAST VP is an acronym for Fully Automated Storage Tiering for Virtual Pools.

As a quick high level summary, FAST VP is about bringing Sub-LUN tiering capabilities to VMAX – functionally speaking, it enables data in a VMAX device (I’m going to refer to them as LUNs for the remainder) to be spread across multiple tiers of storage within a single VMAX.  Up to three different tiers to be precise.  These three tiers might be SATA, FC and SSD.  Oh, and it’s dynamic, data can move up and down the tiers.

Next up, FAST VP, is built upon, and requires, VP Pools.  So if you want to do Sub-LUN on your VMAX, you first need VP Pools.

VP Pools is shorthand for Virtual Provisioning Pools.  VP and VP Pools bring backend wide striping to VMAX and are a fundamental enabling technology required for important technologies such as thin provisioning and now FAST VP/Sub-LUN tiering.  For those of you who know Hitachi, I would say that VP is the feature equivalent of HDP in the Hitachi world

FAST VP Extent Size

Now,,,,,,, dynamically moving data up and down the tiers is usually done in fixed units of size.  Terminology in the industry varies, with some vendors referring to these units as pages, others as chunks and yet others as extents.  EMC uses the term extent, so in this article I will use the term extent.

In VMAX, VP has the notion of an extent, the size of which is 768K,12 tracks, or 1,536 sectors, depending on how your brain works ;-)  And because FAST VP is built on top of VP, the influence of VP can be clearly seen when examining the FAST VP extent size….

What the competition are currently doing:  Hitachi took their 42MB extent (which they call a page – although many have referred to it as more of a book) that they use in their VP equivalent product and passed it directly through to their Sub-LUN code.  So Hitachi move data up and down the tiers in units of 42MB.  On IBM DS8, the Sub-LUN extent size is currently a whopping 1GB! 

But what does all of that mean?  Basically, on IBM DS8, if you have only a few KB of data that needs to be moved up the tiers, that few KB will move, along with the remaining 1GB that makes up the rest of the extent – even if there is no need to move the remaining 1GB.  Sound a bit wasteful?  The principle is the same for Hitachi, only the unit is now much smaller at 42MB.

Clearly there are trade-offs to be made.  The larger the extent, the more wasteful of space, whereas the smaller the extent, the more overhead in tracking and managing each extent.  Then there are things like spatial and referential locality that need to be considered.

So, for VMAX……. clearly the 768K VP extent is too small to be realistic, the overhead would be huge.  So what did EMC choose for their extent size?

Well…… I kinda like what EMC have done here….. they have gone for a somewhat variable extent size, and I expect the future will bring even more variableness..

A FAST VP Extent is composed of 480 VP extents.  Remember that a VP extent is 768K, therefore a FAST VP extent is ~360MB in size (480 x 768K).  And no I won’t bother quoting it in tracks and sectors ;-)  Importantly, all of the 480 VP extents that comprise a FAST VP extent are sequentially addressed within a volume.

Now then, while on the surface it might look like EMC have taken the easy option, picking a value roughly in the middle of Hitachi and IBM, it’s not actually that simple…

Each FAST VP extent is further broken down in to 48 sub-extents, with each sub-extent being ~7.6MB (10 x 768K -or- 360/48).  Again these VP extents are contiguously addressed within the volume.

So when talking about FAST VP, we currently have two extents -

  1. FAST VP extent = ~360 (480 x 768K)
  2. FAST VP sub-extent = ~7.6MB (10 x 768K)

To me, this is one of the reasons that being first to market isn’t all that important.  I’m sure most of the industry will consider the above value to be better than 1GB.  Remember, these are just my personal opinions ;-)

So why have the two values?

Well….. FAST VP may sometimes choose the promote at the FAST VP extent (360MB).  This will make sense because an I/O to an extent will very often followed by an I/O to the surrounding 7.6MB, and quite often by an I/O to the surrounding 360MB.  So moving up the tiers in the larger chunk may be good under some circumstances.  However, if subsequent I/Os to the surrounding 360MB actually never happen, FAST VP can clean itself up and demote the FAST VP sub-extents that never actually received I/O requests, while leaving the sub-extents that have received I/O in the higher tiers.

Also, havning two extents sizes allows for a hierarchical or filtering approach approach.  At the highest level it could track based on the busiest FAST VP extents and then promote only the busiest sub-extents within those FAST VP Extents.  Keeping metadata overhead to a minimum as well as making it quickly searchable.

Those are just two examples…

Does Extent Size Really Matter?

While my answer to that question today is a resounding “Yes”, I’m open to the possibility that in 5-10 years time the answer to the same question may be “No”.  May be, may be not….  I’m just thinking about RAID…. not many people argue over RAID stripe sizes these days, and everybody pretty much agrees that RAID5 is RAID5 and RAID6 is RAID6 – with the exception of Alex McDonald from NetApp who will be all over you if you suggest that NetApp’s RAID-DP is the same as other vendors implementation of dual parity RAID ;-)

Other Important FAST VP Stuff

Other important points, and points that show that EMC have not rushed this functionality to market, include -

  1. If your LUNs are thin provisioned, only the allocated extents are moved
  2. Any tracks that VMAX knows have Never Been Written By Host (NBWBH see here) are not moved
  3. Any tracks that have been unmapped via space reclamation API’s or tracks marked as should be zero (VMware VAAI use of WRITE SAME(0)) are not moved
  4. If VP extents are all zeros, they are not moved, but the metadata is tagged to belong to the new tier so that when any writes come in to these extents they will be targeted for the correct tier.
  5. FAST VP ignores local copy and replication workload so that they do not skew stats.  However, FAST VP is aware of Copy and replication jobs and will throttle backend movements with these in mind
  6. Data in cache is ignored, FAST VP is trying to help us with our cache misses, so including cache data would skew results and be counter-productive
  7. FAST VP weights operations accordingly.  For example, read-miss occurrences are more important than sequential writes – the sweet spot of SSD is still read-miss.
  8. Current, recent, and historical stats are monitored and used appropriately.  Current and recent stats might be more important in promoting data whereas historical stats might play more of  role in deciding whether or not to demote data…

Much of the above is to minimise the load on the backend during moves and is important to keeping FAST VP operations as non-disruptive as possible.  They are also evidence of a product that is at least semi-mature too.  No doubt EMC could have come to market sooner, but at the cost of some of the above.  And as an infrastructure architect and designer, I can promise you that I personally would not appreciate any of my vendors rushing a product to market that was not ready – I’d only end up with customers not wanting to use solutions with my name against them.

There are also plenty of configurables – yes, I know that configurables are often like a two-edged sword, they’re useful but you can easily cut yourself on them.  Either way, FAST VP allows you to specify windows in which data is gathered.  This allows you to ignore periods of time that might skew your data such as batch runs and may be overnight backups..  LUNs can also be pinned so that they are guaranteed to live in a particular tier.  Administrators can define movement windows so that backend movement, which is already tuned, only happens at time the administrator allows.

FAST VP Good but Still Not Perfect

So far so good, but annoyingly for me, the built-in Windows based Service Processor still has a role to play in the operation of FAST VP.  It’s not a huge role, but I believe that if the Service Processor dies, after a few hours promotions and demotions will stop.  Replacing the SP will bring FAST VP back to life and it will essentially pick up where it left off, however, I do not like paying for a Tier 1/World-Class/Enterprise array, call it what you will, only to find that some of its functionality relies on a Windows machine attached over an internal LAN cable.. 

Functionality currently relying on the SP needs moving in to the VMAX ASAP!

FAST VP a Silver Bullet?

While FAST VP is a key technology for VMAX, it is not a silver bullet.  Sub-LUN tiering promises much (I’ll do a shot post on this over the Christmas and New Year period) BUT it brings it’s baggage.  Baggage includes complexities that none of us fully understand yet, as well as a whole load of architecture and design decisions that we are not familiar with.  We have been designing and deploying enterprise arrays in pretty much the same way for the last 10 years, but sub-LUN tiering means much of these design principles are no longer valid.  See here for a short discussion.  See here for a short discussion on this topic.

Before finishing up with FAST VP though, here are a couple more quick points -

  1. . FAST VP movements will not cause VP Pools to become oversubscribed and will honour Pool Reserved Capcity settings configured by administrators.
  2. Yes TDEVs are bound to VP Pools, and this is where extents are initially allocated from. However, over time a TDEV will have its extents spread across up to three tiers (equivalent to pools for the purposes of this point). If you need to move a TDEV back to a single VP Pool, VLUN Migration is your tool.
  3. There is a metadata overhead for every LUN that falls under the control of FAST VP.  Every TDEV (including meta-members) under FAST VP control requires a single extra cache slot (64K) for FAST VP related metadata.
  4. FAST VP is currently FBA only.

Anyway…. that’s probably enough about FAST VP for now.  If you’re still reading, thanks for sticking with it and hopefully it’s been worth your while.

So on to some of the other stuff…

VLUN Mobility for VP

Virtual LUN, or VLUN for short, is the Symmetrix technology used to non-disruptively migrate LUNs between tiers of storage within a VMAX.  A key difference between this and FAST VP is that this is not sub-LUN, this is about moving an entire LUN from one tier to another, and its not automated.

BTW: Somebody please define “non disruptive” to me.

Non-disruptive LUN migration is not a new concept, nor is it a new technology to VMAX. However, prior to 5875 code, VLUN did not understand Virtual Provisioning (VP).

So basically, if you were building your VMAX arrays with VP Pools prior to the 5875 code and you wanted to migrate LUNs between tiers/pools you were snookered.  Imagine, for example, you had provisioned a LUN from a SATA Pool and found out that it needed moving to an FC pool for better performance, you were in a world of hurt!  Good news though, as of 5875 code VLUN is now VP aware and can move LUNs between tiers and pools!

While not an industry first, this is an extremely important tool in the toolbox of any VMAX administrator.

 

Renaming VP Pools

With 5875 code comes the ability to rename VP Pools. Prior to 5875 you only got one shot at naming your VP Pools, get it wrong and you were stuck with it.  You might laugh, but VMAX administrators across the world have been crying out for this. 5875 delivers.

Hey, sometimes it's the small things that count ;-)

Support for VMware VAAI

As of 5875 code VMAX supports the VMware vStorage APIs for Array Integration that came in with vSphere 4.1.  Keeping it short, VAAI supports three hardware offload primitives; hardware assisted block zeroing, hardware assisted locking, and hardware assisted fully copy.  I’ll leave the detail at that, but will say that support for these was a must for Enginuity 5875.

Federated Live Migration (FLM)

This feature allows non-disruptive in-family tech refresh migrations, DMX to VMAX.  If you have old DMX kit on the floor then this could be a god-send for your tech refresh woes.

I’ve had guys at EMC refer to this process as “totally non-disruptive” or “fully non-disruptive”.  So is that different to other operations that are branded as just “non disruptive”?   Are operations referred to as merely non-disruptive actually “almost non-disruptive” or “slightly non-disruptive”…..???  :-D

Data At Rest Encryption

Date At Rest Encryption, or DARE for short is another interesting feature, oh and another one that is available at apparently no performance cost – but requires a new Engine model with a slightly modified backend.  That aside, encryption is starting to rear its ugly head, and the DARE capabilities in VMAX with 5875 code put a tick in that box. Obviously data at rest encryption does not solve the end to end encryption issue and is by no means the answer to all encryption requirements, but is an arrow in your quiver.

 

But What Is Still Missing?

With so much in this release of microcode (I’ve only covered some of the stuff) there is still a ton of stuff missing.  Notable absentees include -

  1. Still no Virtualisation like that sported Hirachi USP, USP V and VSP.  This one sticks out like a sore thumb to me and while I genuinely believe that EMC do not see much value in this technology as implemented by Hitachi, I am amazed that they haven't plugged the gap yet – even if just to keep the competition quiet.
  2. Customer impacting BIN file changes are almost gone, but a small number still annoyingly linger around. Please make them all go away in the next version
  3. GateKeepers.  The bane of many a VMAX architect or administrators life.  I’m not opposed to in-band management, I just don’t like the cumbersome ugly way GateKeepers are implemented.  I know that the planet sized brains at EMC could make GateKeeper implementation much slicker if given the time.  Please improve your GateKeeper implementation.
  4. Metavolume complexities still exist.  The decision between the more flexible but lower performing concatenated metavolumes, versus the higher performing but less flexible striped metavolumes still exists.  It’s about time such complexity was taken away.  Administrators just want to create and grow LUNs on the fly, without having to make the aforementioned decisions.  We want simplicity, performance and flexibility.  After all, it is 2010.
  5. The underlying RAID architecture remains fundamentally unchanged.  Today’s world of extremely large disks and arrays with extremely large quantities of these large disks requires a more modern approach to RAID.  I’m a fan of distributed/declustered/parallel RAID.  I hope the guys at Hopkinton are working on something, as current RAID implementations seen in the likes of VMAX, VSP and DS8 occasionally cause me to wake up at night in a cold sweat.
  6. No FCoE.  Is this significant to the march of FCoE?  I’m pretty sure CLARiiON already supports FCoE, so surely they could have ported the knowledge over to VMAX if they thought it was important?
  7. Finally, as far as I’m aware, customers still cannot program the blue LED strip light on the front of VMAX arrays!  But if you want to see VMAX LEDs doing something cool, check this out at Chads site

Well…… that’s just over 3,000 words and completes a Deep Dive of another enterprise storage array (see here for Hitachi VSP), I hope it’s valuable to people and I wonder who is next…..?

If you have any thoughts I’d love to hear from you.  Your thoughts and comments are welcome here.  You can also talk to me about storage and technology in general on Twitter by tweeting me @nigelpoulton

Oh one last thought…. I wonder what Moshe thinks of VMAX ;-)

24 comments for “VMAX Comes of Age

  1. Paul Robinson
    December 22, 2010 at 8:46 am

    Nigel
    Good Post, Looks like EMC have made some interesting changes. A few questions if you don't mind ?
    Why is released in Stealth ?? seems they should be shouting this from the roof tops !
    Have they changed the way the device alerts ? do we still have to use ECC to forward SNMP traps ?
    Did they increase the max TDEV size of 240GB ? far too small for my organisation.
    What are your thoughts around chargeback ? I bet that's interesting when devices are spread across multiple pools using different drive types!
    How does thin provisioning / over-subscription work with multiple pools ? hmmm sounds complex…
    Thanks
    Paul

  2. Vinay Babu
    December 22, 2010 at 1:26 pm

    A very good post on new features of the 5875 code. Still the meta volume expansion issues should be taken care ASAP!
    But does fast VP is as transparent as Vmotion in VMWare?

  3. December 22, 2010 at 2:34 pm

    Alex McDonald from NetApp who will be all over you if you suggest that NetApp’s RAID-DP is the same as other vendors implementation of dual parity RAID

    Surely will!

    Nice piece, Nigel.
     
    It still leaves me with several questions about the effectiveness of shovelling huge  sub-lumps sub-LUN extents around the back end of an array. I haven't seen any research or stats that would indicate xGB or yMB is an appropriate size, and I (for one) would like to see some.
    NetApp use data from the running array to generate models for effective cache sizes, something any NetApp user can do to see if a NetApp Flex Cache solution will deliver improved and automatic performance gains. Is there a similar VMax tool?

    It all seems so complex too. And expensive; how's it priced? Ah well, presumably EMC know what they're doing. At least, for their customer's sakes, I hope so.

  4. December 22, 2010 at 7:39 pm

    Hi Paul,

    Thanks for dropping by and chipping in with some comments.

    To address your questions –

    Why a stealth release?  It will have a public launch in the New Year but Im not sure why they are waiting, may be because Christmas isn't the best time to make announcements…?

    As for SNMP traps.  Im not 100% certain what your asking, but Im guessing you would like SNMP alerts to come directly out of VMAX and not require ECC?  To my knowledge, this has not changed and still requires ECC.

    Max TDEV Size is also still at 240GB.  Im pushing EMC to increase this to something like 2TB.  Make sure you pressure them too, between us we might get somewhere!

    While I hear what you say about the complexities of chargeback and TP, I think we'll eventually get our heads around it.  There'll be a learning curve and I think it will always require skilled people to deploy, but it's much needed for VMAX.

    Nigel

  5. December 22, 2010 at 7:47 pm

    Vinay,

    Yes FAST VP seems to be pretty smooth.  Obviously like all things it will depend on your configuration and how busy your array is, but it moves extents around in the background nicely and seems to do exactly what it says on the tin – at least from my experience.

    If we know anything about EMC and their core engineering in Symmetrix group – they dont release code unles they have tested it to the nth degree.  I can't necessarily say the same thing about SMC and SPA etc..

  6. December 22, 2010 at 7:52 pm

    Hi Alex,

    Im aware of research done that suggests a high degree of probability that if I/O is received to a previously cold area (residing on SATA) there will be more I/Os to the surrounding 5-10MB.  And there is a moderate probability that there will be similar I/O to the surrounding 256+MB.

    VMAX will also try to always move at the sub-extent level (7.6MB).

    As for whether EMC have  a tool similar to the one you mention….. No not to my knowledge.

  7. December 22, 2010 at 8:46 pm

    Hi, Dimitris from NetApp here.
    What's not clear with any auto-tiering scheme is the effect on the total I/O performance of the array.
    EMC's own docs indicate that simply by going to virtual pools (at least on the CX), performance drops, and it drops even more when doing thin provisioning. But it's never quantified – the verbiage is more like "if you want the highest performance then use traditional RAID groups and LUNs".
    FAST adds even more metadata and processing. You can't get something for nothing.
    Ultimately – is what you get with auto-tiering in general worth the cost , complexity and hassle?
    What if you could get the purported benefits of auto-tiering via other means that are more easily quantifiable?
    D

  8. December 22, 2010 at 10:23 pm

    Hi Dimitris,

    Thanks for popping by and thanks for the disclosure…

    I hear what you’re saying re effects on overall performance of the array. I remember when Hitachi first started doing their virtual pools, there was a lot of talk about the performance impact of allocating extents to thinly provisioned LUNs and that is pretty well known. However, the performance impact of virtual provisioning but without the thin provisioning aspect, would be considerably less, but I agree their may be some impact.

    I also remember when Hitachi (Claus Mikklesen) challenged anyone to configure a USP V with traditional RAID Groups and LUNs that could outperform one configured with their virtual provisioning. To me this suggested they were confident of the performance of their virtual pools. Mind you, I offered to take Claus up on his challenge but he never responded :-(

    The thing is though….. for the majority of enterprise arrays that will handle mixed workloads, the hassle of trying to configure traditional RAID Groups and LUNs and maintain a balanced array is too much.

    So, in answer to your question, Yes, trading off a minimal performance hit for a huge increase in simplicity and balance may be more than worth it for 99.9% of people.

    Oh and feel free to tell me how NetApp “get the purported benefits of auto-tiering via other means that are more easily quantifiable”

    Nigel

  9. December 23, 2010 at 12:09 am

    Hi Nigel,
     
    All I'm saying is, nobody really explains the exact impact – for instance, when I size a system, I tell a customer the precise impact a certain feature will have, and the tradeoffs. Yes, the idea is that it's probably better to have data spread out in multiple disks at the risk of increased metadata overhead than be spindle-limited by a 5-disk RAID group, plus pools are simpler anyway – which is why NetApp has been doing them for a while (as has 3Par and a few other vendors).
     
    However, we're not just talking about thin pools and wide striping, in this case we're also talking about additional work needed to move the stuff around depending on I/O.
     
    I guess it's too soon to tell – once customers start deploying heavy, highly random workloads we will see how auto-tiering technologies really cope. I've seen Compellent's fail under a rapidly changing random workload, but maybe EMC cracked it. The customer success stories will show what's going on.
     
    So, what does auto-tiering try to provide? What are the "asks"?
    1. Simplicity – easier deployment, easier administration, lower OpEx
    2. Lowered costs
    3. Improve or at least not harm performance.
     
    Seems that #1 could only be achieved if the algorithms are rock-solid and don't need to constantly be massaged and babysat. At the moment, every auto-tiering implementation seems to need someone to babysit it and check the rules, write exceptions etc.
     
    Regarding lowered costs: There are hefty licensing fees for the auto-tiering capability, plus it needs a lot of ancillary licenses. At what point does one hit the law of diminishing returns?
     
    Regarding performance: The jury's out but so far I haven't seen an auto-tiering implementation improve performance, and most of the time I've seen it harm performance. I hope you try this out and tell everyone how it's doing (unless some kind of legal restriction prohibits that… :) )
     
    On the NetApp front, our disk pools and gigantic Flash Cache already tackle all 3 auto-tiering "asks" since it's easy, lowers costs and dramatically improves performance. It's still auto-tiering, just without the data movement part.
     
    A bit anti-climactic, I know, but sometimes it's better that way :)
     
    Have a good one
     
    D

  10. December 27, 2010 at 6:41 pm

    Hi Dimitris,

    Are you serious that you tell customers the “precise impact” a certain feature will have? There aren’t many vendors out there that will do that!

    Also, are you suggesting that NetApp wide-striping is in the same league as 3PAR? I see 3PAR as a best of breed in that department and it’s quite somthing to put yourself up there with them in this department.

    I hear your concern about auto-tiering products coming a cropper under certain workloads. I’d hope that vendors would have solid guidelines and best practices to help keep their customer away from these such situations…. Oh and I also hope that the sales guys on the street arent selling it as a silver bullet. May be it should come with a “use appropriately” sticker.

    Im not sure that I agree that auto-tiering is supposed to improve performance. Like you, I know most implementations will have their overheads. I see it more along the lines efficiency. Delivering “appropriate” performance but with fewer disks, fewer floor tiles and less wasted space….. getting the access density of IOPs and capacity right. Growing your array via capacity or performance rather than just adding in more and more disks of the same type that were installed when the array was purchased.

    I agree with you that the jury will be out for quite a while on the simplicity side as well. Like you say, the algorithms are important but euqally so are the tools to manage, monitor and plan. Eventually it might deliver simplicity, but not on day one.

    Finally..on the point of NetApp Flash Cache delivering the equivalent of auto-tiering…. Im not sure I see it as tackling the same problem or delivering the same results. I might be wrong though.

  11. December 27, 2010 at 8:56 pm

    Hi Nigel,
     
    First of all, I hope you had a great Holiday.
     
    I didn't compare NetApp vs anyone's striping, I just mentioned they are among the few vendors doing it for a long time. NetApp striping is pretty wide (for instance, hundreds of SAS disks in one pool). Wide enough for most use cases.
     
    Yes indeed I and all the other NetApp engineers I deal with tell customers the impact something will have on their system. NetApp sizing tools are extremely detailed and even spit out how busy the CPUs of the system will be based on a certain workload, which I then explain to the customers. This way they can avoid getting a wrong-sized system, since there's a lot more to total performance than number of spindles and memory. We even have predictive tools that tell us whether Flash Cache will have any effect (and what impact on the controller it will have). And the way workloads are characterized is very detailed (to the point of annoying many people expecting a flat IOPS answer).
     
    I haven't seen the sizing tools of any other vendor going to that depth (and I used to sell a few different ones). Typically they tell you how many spindles but don't take into account controller load, or don't characterize I/O in a detailed fashion. Very important.

     
    Unfortunately, what I see on the street is that, indeed, auto-tiering vendors sell that as a silver bullet, building configs that way without examining the I/O and the controller impact first. Maybe for the very large accounts it's not that way, but in the Commercial scene, it very much happens like that. It makes for a simple sales story but, again, it's early days. 
     
    EMC sells their Vmax auto-tiering as improved performance for the same cost, or same performance at lower cost – so they are selling it as a performance-enhancing architecture. Of course, we'll see. I'm sure this will be true in several situations.
     
    Lastly, regarding NetApp Flash Cache:
     
    I've had it lower I/O requirements for Oracle from 300+ mirrored drives on a competitive enterprise platform to a few dozen drives on a NetApp system. I've had it allow just 1-2 drive types in a system vs 3-4, and enable the use of larger drives (i.e. 600GB SAS) vs double the quantity of smaller drives. I've had it allow SATA drives run DB and Virtualization workloads.
     
    So yes – NetApp Flash Cache offers reduced footprint, reduced cost, increased performance, and vastly reduced administrative effort. Kinda sounds like what auto tiering is promising, the only difference is we're already delivering, and have been for a while.
     
    It's a bit like this: It's better if customers describe to a vendor the business problems they're trying to solve. There's more than one way to skin a cat, and various vendors will have different approaches. If NetApp can do certain things (and more) with cleverly virtualized blocks and the help of large caches that other vendors need elaborate auto-tiering schemes in order to accomplish, then it's a valid approach. Just different.
     
    The worst thing a customer can do is demand a specific way to handle a problem. That's how unfair RFPs are born… :) 
     
    And, last but not least, NetApp's WAFL subsystem has been abstracting the geometric vs logical position of disk blocks since 1992. In addition to cleverly placing chunks of data in the most appropriate places and moving stuff around if need be. And has only been evolving ever since. Which is why I'm not sure why some people think NetApp can't do autotiering. The block relocation code is there and it would be fairly easy. Consider this: Maybe, just maybe, we discovered a more efficient approach for our architecture, and don't need to mimic what's more efficient for someone else's architecture :)
     
    Thx
     
    D

  12. December 28, 2010 at 9:31 am

    Hi Dimitris,

    Yes I had a great Christmas, hope you did too.

    Are you seeing any change in customer requirements? By that, I mean are you seeing more customers soming to you with specific well defined requirements, or are you seeing more customers come to you with finger in the air requirements.

    I see less and less large customers coming with specific well defined requirements, especially larger accounts where they are deploying at horrific pace – they can barely get kit on the ground in time to meet demand. These customers dont have specific requirements and look to technologies like wide striping and auto-tiering to balance and learn over time.

    What are you seeing with customers?

    Nigel

  13. December 29, 2010 at 1:24 am

    Actually I see 3 different customer profiles, irrespective of company size:
     
    1. The somewhat technically inexperienced customer that is content to let array intelligence help them out as much as possible using techniques like app-aware snapshots, wide striping, megacaches and/or autotiering. They will not tune things, and will just buy more gear as needed, even though a bit of fiddling would have reduced the expense and/or improved performance.
     
    2. The technically experienced customer that is willing to let array automation help them out but is aware manual intervention and tuning will be required.
     
    3. The customer that is a "control freak", doesn't trust automation or new technologies and likes instead to do things the old-fashioned way. Call it paranoia, job security or being a neo-luddite.
     
    I think it has more to do with the individuals making technology decisions than anything else. 
     
    However, in each case but the third, there are two sub-categories (and I'm trying to not over-analyze here):
     
    a) Loose requirements, not much vendor affinity. They need the most help and are the most susceptible to all vendor messaging/FUD.
     
    b) Very specific performance, configuration and capability requirements, tight vendor affinity – very susceptible to their favorite vendor's messaging/FUD.
     
    To summarize: If anything, statistically, I'd say that the larger customers I've encountered were more likely to be extremely specific about things. Maybe not the CIO types, but definitely the people touching the gear.
     
    I do think as certain automation technologies are maturing, we will see the shift. It will take time for the neo-luddites to convert but it will happen :)
     
    D

  14. Melonhead
    December 31, 2010 at 8:17 am

    Alex:
    I know that WAFL can easily reallocate blocks, but if I have two Aggregates, one is made of SSD and the other is made of SATA, can WAFL migrate the partial blocks of a Lun to the other aggregate while keeping the unified view? I don't think so. And, WAFL will exhaust the free space of an SSD rapidly, which would reduce the SSD performance significantly.

  15. Rene Fontaine
    January 4, 2011 at 8:14 pm

    Just a quick correction to the question on SNMP traps.  Solutions Enabler has the ability to generate SNMP traps directly for Symmetrix related events. Please refer to the release notes and install guide for details.

  16. January 17, 2011 at 4:14 pm

    Nigel, this is an interesting piece of information on FAST VP. Obviously you had access to a lot of information and did your home work well. I would like to respond but the problem is no one else seems to know about this and there is no other source than your blog. It is not on any EMC website. I am in Asia this week and have visited EMC customers and joint EMC and Hitachi resellers and no one seems to know about this. Please check my blog to see further questions. http://blogs.hds.com/hu/2011/01/where-is-fast-vp.html

  17. January 17, 2011 at 6:24 pm

    Hu,

    As I mention at the beginning of the post, I am surprised that EMC didn’t shout this from the rooftops. However, I imagine they will – at some point.

    Aside from that, I can confirm that FAST VP is GA and available to customers.

    I am very surprised that EMC custoemrs and resellers (as you mention) are not aware of this as info is available on PowerLink.

  18. Mox3311
    March 3, 2011 at 6:52 pm

    I like the special little feature that the SVP controls the tiering.
    I have questions for the brilliant EMC engineers:
    Really?
    Didya?
    A wintel PC controlling that important functionality.
    NOT NICE!

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You can add images to your comment by clicking here.