EMC surprised me last week. On the15th December they released, what in my opinion, is the biggest uplift in functionality to their flagship Symmetrix product line in a very long time! The thing that surprised me was that after nearly two years of talking it up, they sneaked it out with barely a sound from Hopkinton. Not what I was expecting after 2+ years of development and nearly the same in ranting about it.
Still who am I to know. So lets take a look….
DISCLAIMER: Opinions expressed and statements made in this article and elsewhere on this web site are my own personal opinions and do not represent those of any of my employers, past, present and future. I do not guarantee the accuracy of anything said in this article. I do not work for EMC and I am not an authority on VMAX, not an official one at least
The following list is what I consider the major features of this hugely important release (we’ll go in to more detail on some of them later in the article)
- FAST VP/Sub-LUN auto-tiering (we’ll deep dive this)
- VLUN Mobility for VP
- VMware VAAI support
- Federated Live Migration
- The usual performance uplifts
Let’s just take a couple of quick seconds to cover the basics of the matter at hand.
The crux of the updates that EMC released are contained within the 5875 code, or more properly, Enginuity 5875. For those who may not know, Enginuity trademarked brand name for the microcode that has made the Symmetrix family of products tick for the last 20 or so years. Enginuity is so fundamental to Symmetrix that you could call it the DNA of the Symmetrix family, and what makes VMAX a Symmetrix.
Microcode is always important to storage arrays, but with the current EMC ethos being that hardware is essentially commodity and that real value and differentiation are driven from software, Enginuity takes an even more important role in the life of VMAX.
All of the 5 points I listed above are features of Enginuity 5875.
So now that we’ve got the background covered lets get into the good stuff…..
FAST VP Deep Dive
Let’s cut right to the chase and get under the hood of FAST VP…..
There is no doubt that of the 50+ features and enhancements comprising the Enguinity 5875 release, FAST VP is the daddy! Much of the industry, including EMC themselves, have been extolling the virtues of sub-LUN tiering for what feels like the last 50 years.
However, despite waxing lyrical over its potential and investing so heavily in its development, EMC are the last of the so called big three to come to market with the functionality. IBM were first to market with the functionality in the DS8700 product, and more recently Hitachi came to market with the functionality in their VSP product. However, as we should all know, it’s not always about being first to market.
Anyway…… Let’s start with some basics but move quickly to the interesting stuff.
First up, FAST VP is an acronym for Fully Automated Storage Tiering for Virtual Pools.
As a quick high level summary, FAST VP is about bringing Sub-LUN tiering capabilities to VMAX – functionally speaking, it enables data in a VMAX device (I’m going to refer to them as LUNs for the remainder) to be spread across multiple tiers of storage within a single VMAX. Up to three different tiers to be precise. These three tiers might be SATA, FC and SSD. Oh, and it’s dynamic, data can move up and down the tiers.
Next up, FAST VP, is built upon, and requires, VP Pools. So if you want to do Sub-LUN on your VMAX, you first need VP Pools.
VP Pools is shorthand for Virtual Provisioning Pools. VP and VP Pools bring backend wide striping to VMAX and are a fundamental enabling technology required for important technologies such as thin provisioning and now FAST VP/Sub-LUN tiering. For those of you who know Hitachi, I would say that VP is the feature equivalent of HDP in the Hitachi world
FAST VP Extent Size
Now,,,,,,, dynamically moving data up and down the tiers is usually done in fixed units of size. Terminology in the industry varies, with some vendors referring to these units as pages, others as chunks and yet others as extents. EMC uses the term extent, so in this article I will use the term extent.
In VMAX, VP has the notion of an extent, the size of which is 768K,12 tracks, or 1,536 sectors, depending on how your brain works And because FAST VP is built on top of VP, the influence of VP can be clearly seen when examining the FAST VP extent size….
What the competition are currently doing: Hitachi took their 42MB extent (which they call a page – although many have referred to it as more of a book) that they use in their VP equivalent product and passed it directly through to their Sub-LUN code. So Hitachi move data up and down the tiers in units of 42MB. On IBM DS8, the Sub-LUN extent size is currently a whopping 1GB!
But what does all of that mean? Basically, on IBM DS8, if you have only a few KB of data that needs to be moved up the tiers, that few KB will move, along with the remaining 1GB that makes up the rest of the extent – even if there is no need to move the remaining 1GB. Sound a bit wasteful? The principle is the same for Hitachi, only the unit is now much smaller at 42MB.
Clearly there are trade-offs to be made. The larger the extent, the more wasteful of space, whereas the smaller the extent, the more overhead in tracking and managing each extent. Then there are things like spatial and referential locality that need to be considered.
So, for VMAX……. clearly the 768K VP extent is too small to be realistic, the overhead would be huge. So what did EMC choose for their extent size?
Well…… I kinda like what EMC have done here….. they have gone for a somewhat variable extent size, and I expect the future will bring even more variableness..
A FAST VP Extent is composed of 480 VP extents. Remember that a VP extent is 768K, therefore a FAST VP extent is ~360MB in size (480 x 768K). And no I won’t bother quoting it in tracks and sectors Importantly, all of the 480 VP extents that comprise a FAST VP extent are sequentially addressed within a volume.
Now then, while on the surface it might look like EMC have taken the easy option, picking a value roughly in the middle of Hitachi and IBM, it’s not actually that simple…
Each FAST VP extent is further broken down in to 48 sub-extents, with each sub-extent being ~7.6MB (10 x 768K -or- 360/48). Again these VP extents are contiguously addressed within the volume.
So when talking about FAST VP, we currently have two extents -
- FAST VP extent = ~360 (480 x 768K)
- FAST VP sub-extent = ~7.6MB (10 x 768K)
To me, this is one of the reasons that being first to market isn’t all that important. I’m sure most of the industry will consider the above value to be better than 1GB. Remember, these are just my personal opinions
So why have the two values?
Well….. FAST VP may sometimes choose the promote at the FAST VP extent (360MB). This will make sense because an I/O to an extent will very often followed by an I/O to the surrounding 7.6MB, and quite often by an I/O to the surrounding 360MB. So moving up the tiers in the larger chunk may be good under some circumstances. However, if subsequent I/Os to the surrounding 360MB actually never happen, FAST VP can clean itself up and demote the FAST VP sub-extents that never actually received I/O requests, while leaving the sub-extents that have received I/O in the higher tiers.
Also, havning two extents sizes allows for a hierarchical or filtering approach approach. At the highest level it could track based on the busiest FAST VP extents and then promote only the busiest sub-extents within those FAST VP Extents. Keeping metadata overhead to a minimum as well as making it quickly searchable.
Those are just two examples…
Does Extent Size Really Matter?
While my answer to that question today is a resounding “Yes”, I’m open to the possibility that in 5-10 years time the answer to the same question may be “No”. May be, may be not…. I’m just thinking about RAID…. not many people argue over RAID stripe sizes these days, and everybody pretty much agrees that RAID5 is RAID5 and RAID6 is RAID6 – with the exception of Alex McDonald from NetApp who will be all over you if you suggest that NetApp’s RAID-DP is the same as other vendors implementation of dual parity RAID
Other Important FAST VP Stuff
Other important points, and points that show that EMC have not rushed this functionality to market, include -
- If your LUNs are thin provisioned, only the allocated extents are moved
- Any tracks that VMAX knows have Never Been Written By Host (NBWBH see here) are not moved
- Any tracks that have been unmapped via space reclamation API’s or tracks marked as should be zero (VMware VAAI use of WRITE SAME(0)) are not moved
- If VP extents are all zeros, they are not moved, but the metadata is tagged to belong to the new tier so that when any writes come in to these extents they will be targeted for the correct tier.
- FAST VP ignores local copy and replication workload so that they do not skew stats. However, FAST VP is aware of Copy and replication jobs and will throttle backend movements with these in mind
- Data in cache is ignored, FAST VP is trying to help us with our cache misses, so including cache data would skew results and be counter-productive
- FAST VP weights operations accordingly. For example, read-miss occurrences are more important than sequential writes – the sweet spot of SSD is still read-miss.
- Current, recent, and historical stats are monitored and used appropriately. Current and recent stats might be more important in promoting data whereas historical stats might play more of role in deciding whether or not to demote data…
Much of the above is to minimise the load on the backend during moves and is important to keeping FAST VP operations as non-disruptive as possible. They are also evidence of a product that is at least semi-mature too. No doubt EMC could have come to market sooner, but at the cost of some of the above. And as an infrastructure architect and designer, I can promise you that I personally would not appreciate any of my vendors rushing a product to market that was not ready – I’d only end up with customers not wanting to use solutions with my name against them.
There are also plenty of configurables – yes, I know that configurables are often like a two-edged sword, they’re useful but you can easily cut yourself on them. Either way, FAST VP allows you to specify windows in which data is gathered. This allows you to ignore periods of time that might skew your data such as batch runs and may be overnight backups.. LUNs can also be pinned so that they are guaranteed to live in a particular tier. Administrators can define movement windows so that backend movement, which is already tuned, only happens at time the administrator allows.
FAST VP Good but Still Not Perfect
So far so good, but annoyingly for me, the built-in Windows based Service Processor still has a role to play in the operation of FAST VP. It’s not a huge role, but I believe that if the Service Processor dies, after a few hours promotions and demotions will stop. Replacing the SP will bring FAST VP back to life and it will essentially pick up where it left off, however, I do not like paying for a Tier 1/World-Class/Enterprise array, call it what you will, only to find that some of its functionality relies on a Windows machine attached over an internal LAN cable..
Functionality currently relying on the SP needs moving in to the VMAX ASAP!
FAST VP a Silver Bullet?
While FAST VP is a key technology for VMAX, it is not a silver bullet. Sub-LUN tiering promises much (I’ll do a shot post on this over the Christmas and New Year period) BUT it brings it’s baggage. Baggage includes complexities that none of us fully understand yet, as well as a whole load of architecture and design decisions that we are not familiar with. We have been designing and deploying enterprise arrays in pretty much the same way for the last 10 years, but sub-LUN tiering means much of these design principles are no longer valid. See here for a short discussion. See here for a short discussion on this topic.
Before finishing up with FAST VP though, here are a couple more quick points -
- . FAST VP movements will not cause VP Pools to become oversubscribed and will honour Pool Reserved Capcity settings configured by administrators.
- Yes TDEVs are bound to VP Pools, and this is where extents are initially allocated from. However, over time a TDEV will have its extents spread across up to three tiers (equivalent to pools for the purposes of this point). If you need to move a TDEV back to a single VP Pool, VLUN Migration is your tool.
- There is a metadata overhead for every LUN that falls under the control of FAST VP. Every TDEV (including meta-members) under FAST VP control requires a single extra cache slot (64K) for FAST VP related metadata.
- FAST VP is currently FBA only.
Anyway…. that’s probably enough about FAST VP for now. If you’re still reading, thanks for sticking with it and hopefully it’s been worth your while.
So on to some of the other stuff…
VLUN Mobility for VP
Virtual LUN, or VLUN for short, is the Symmetrix technology used to non-disruptively migrate LUNs between tiers of storage within a VMAX. A key difference between this and FAST VP is that this is not sub-LUN, this is about moving an entire LUN from one tier to another, and its not automated.
BTW: Somebody please define “non disruptive” to me.
Non-disruptive LUN migration is not a new concept, nor is it a new technology to VMAX. However, prior to 5875 code, VLUN did not understand Virtual Provisioning (VP).
So basically, if you were building your VMAX arrays with VP Pools prior to the 5875 code and you wanted to migrate LUNs between tiers/pools you were snookered. Imagine, for example, you had provisioned a LUN from a SATA Pool and found out that it needed moving to an FC pool for better performance, you were in a world of hurt! Good news though, as of 5875 code VLUN is now VP aware and can move LUNs between tiers and pools!
While not an industry first, this is an extremely important tool in the toolbox of any VMAX administrator.
Renaming VP Pools
With 5875 code comes the ability to rename VP Pools. Prior to 5875 you only got one shot at naming your VP Pools, get it wrong and you were stuck with it. You might laugh, but VMAX administrators across the world have been crying out for this. 5875 delivers.
Hey, sometimes it's the small things that count
Support for VMware VAAI
As of 5875 code VMAX supports the VMware vStorage APIs for Array Integration that came in with vSphere 4.1. Keeping it short, VAAI supports three hardware offload primitives; hardware assisted block zeroing, hardware assisted locking, and hardware assisted fully copy. I’ll leave the detail at that, but will say that support for these was a must for Enginuity 5875.
Federated Live Migration (FLM)
This feature allows non-disruptive in-family tech refresh migrations, DMX to VMAX. If you have old DMX kit on the floor then this could be a god-send for your tech refresh woes.
I’ve had guys at EMC refer to this process as “totally non-disruptive” or “fully non-disruptive”. So is that different to other operations that are branded as just “non disruptive”? Are operations referred to as merely non-disruptive actually “almost non-disruptive” or “slightly non-disruptive”…..???
Data At Rest Encryption
Date At Rest Encryption, or DARE for short is another interesting feature, oh and another one that is available at apparently no performance cost – but requires a new Engine model with a slightly modified backend. That aside, encryption is starting to rear its ugly head, and the DARE capabilities in VMAX with 5875 code put a tick in that box. Obviously data at rest encryption does not solve the end to end encryption issue and is by no means the answer to all encryption requirements, but is an arrow in your quiver.
But What Is Still Missing?
With so much in this release of microcode (I’ve only covered some of the stuff) there is still a ton of stuff missing. Notable absentees include -
- Still no Virtualisation like that sported Hirachi USP, USP V and VSP. This one sticks out like a sore thumb to me and while I genuinely believe that EMC do not see much value in this technology as implemented by Hitachi, I am amazed that they haven't plugged the gap yet – even if just to keep the competition quiet.
- Customer impacting BIN file changes are almost gone, but a small number still annoyingly linger around. Please make them all go away in the next version
- GateKeepers. The bane of many a VMAX architect or administrators life. I’m not opposed to in-band management, I just don’t like the cumbersome ugly way GateKeepers are implemented. I know that the planet sized brains at EMC could make GateKeeper implementation much slicker if given the time. Please improve your GateKeeper implementation.
- Metavolume complexities still exist. The decision between the more flexible but lower performing concatenated metavolumes, versus the higher performing but less flexible striped metavolumes still exists. It’s about time such complexity was taken away. Administrators just want to create and grow LUNs on the fly, without having to make the aforementioned decisions. We want simplicity, performance and flexibility. After all, it is 2010.
- The underlying RAID architecture remains fundamentally unchanged. Today’s world of extremely large disks and arrays with extremely large quantities of these large disks requires a more modern approach to RAID. I’m a fan of distributed/declustered/parallel RAID. I hope the guys at Hopkinton are working on something, as current RAID implementations seen in the likes of VMAX, VSP and DS8 occasionally cause me to wake up at night in a cold sweat.
- No FCoE. Is this significant to the march of FCoE? I’m pretty sure CLARiiON already supports FCoE, so surely they could have ported the knowledge over to VMAX if they thought it was important?
- Finally, as far as I’m aware, customers still cannot program the blue LED strip light on the front of VMAX arrays! But if you want to see VMAX LEDs doing something cool, check this out at Chads site
Well…… that’s just over 3,000 words and completes a Deep Dive of another enterprise storage array (see here for Hitachi VSP), I hope it’s valuable to people and I wonder who is next…..?
If you have any thoughts I’d love to hear from you. Your thoughts and comments are welcome here. You can also talk to me about storage and technology in general on Twitter by tweeting me @nigelpoulton
Oh one last thought…. I wonder what Moshe thinks of VMAX