OK then…. the *big* storage announcement at HP Discover was dedupe for 3PAR. Well, now that the fireworks are finished and the rock music’s died down, I think it’s time we take a closer look and make sure it’s not just a bunch of smoke and mirrors….
First though, let’s quickly summarise what HP announced and what we’re gonna be taking a closer look at.
What Was Announced
Basically, HP announced hardware assisted inline deduplication for the 3PAR 7450 all-flash-array (AFA).
Though like with most tech announcements at these kind of events, it’s is a pre-announcement. So nothing shipping’s yet, and regular customers probably shouldn’t expect to get their hands on it until around September of this year (2014). And when it does ship, expect it to ship as part of a major new code release – probably something like Inform OS version 3.2.1.
Anyway, let’s see if we hash blast-off at the end of that 3.2.1 countdown, or just a a drab miss-fire.
I think it’s fair to say that HP are aiming to deliver an enterprise class AFA weighing in at, or close to, the magical $2 per usable GB. And if they manage that with an all-flash 3PAR array, then it’s game on in the already white-hot AFA space!
NOTE: $2/TB is kind of magical because it’s when flash starts to become cheaper than high performance disk
So time for some detail……
A Little Detail
So the initial support for this hardware assisted inline deduplication is only for the all-flash 7450 model.
No major surprises there. After all, the very nature of flash memory lends itself to dedupe a lot more than spinning disk. Plus, there’s an all-out war going on in the AFA space right now. It won’t be long before real winners and losers will be decided. So HP has to get in there before the game is won.
But, and a little straight forward speculation here….. despite the announcement being specifically the 3PAR 7450 AFA, let’s not forget that the entire 3PAR family runs the same code! So, it’s fair to expect support to be extended to the rest of the 3PAR family pretty soon. But with one small requirement….. this stuff requires the latest Gen 4 ASIC! So that means only the following models:
If you happen to be the proud owner of something older, like a T800 – that has a Gen 3 ASIC – then I’m afraid you’re bang out of luck! A gen 4 ASIC is is a base requirement. And no, HP won’t be forking different code trees for Gen 3 and Gen 4 ASIC based systems. Older systems based on Gen 3 ASICs just won’t get the new code releases. So it really is the retirement home for those older systems.
Point being, it’s all about the ASIC!
The ASIC Giveth and the ASIC Taketh…..
…. and in this instance, blessed be the name of the ASIC.
So the absolute key to deduplication in an HP 3PAR array is the ASIC. No ASIC would mean no deduplication. Or at least not without lashing a massive tax on the CPUs, and almost certainly trashing performance.
Now then, in reality, the ASIC is key to all good things in 3PAR. For example, Thin Provisioning, Zero Detection, and RAID are all done by the ASIC. And because the ASIC is hardware, all of this is fast! So the ASIC helps a lot! What the ASIC doesn’t help with though, is the ability to easily implement a software defined 3PAR – a VSA! In fact the ASIC is almost certainly the number 1 reason we still don’t have a 3PAR VSA – and may be never will!?
Anyway….. not only does dedupe use the ASIC, it uses a bunch of functions and logic that are already implemented in the ASIC, and already being used for other features. To see this, let’s take quick a look at how dedupe will work
Walking Through Dedupe on 3PAR
As data comes in to the 3PAR, it goes through the ASIC (all I/O always goes through the ASIC).
The ASIC will perform a hash on the data to see if there’s a chance the data has been seen before. Now this hashing is new, but the ASIC is capable of doing it. Then….exception table logic – already implemented in the ASIC and used by Thin Provisioning – is then used with the hashes to determine potential collisions. If a potential collision is detected, a compare operation will be initiated (remember that read operations are practically free on flash). This compare operation uses the XOR engine in the ASIC that already exists for parity calcs. If no collision is detected, the XOR returns a string of all zeros. And guess what, the ASIC already performs zero-detect as part of the thin persistence feature of the array (all data coming into the array is inspected by the ASIC for strings of zeros).
And while that’s high level, I think it’s pretty good at showing how existing technology already implemented in the ASIC, plus a bit of hashing hashing, are used as the foundation for hardware based deduplication in 3PAR.
So what does all this mean?
Well for starters, I think it means HP were damn lucky. I mean already having most of the parts in place really helps.
They probably also got lucky with the ASIC being capable of performing the hashes. If the hashing capability wasn’t there, they’d either be asking the CPUs to do the hashing, or waiting for the Gen 5 ASIC.
But luck or foresight aside, the fact that it can all be done in hardware means it should be fast and of minimal impact to the wider system!
BTW: In saying this was somewhat luck, I don’t mean to take away from the enormous engineering effort to piece it all together and implement, I’m merely saying it could have been a very different story.
According to David Scott, a design mantra of the 3PAR platform is never to compromise performance. More than once he said they don’t implement features at the cost of performance. Fair enough, but David is Big Chief Sitting Bull of all things HP storage, so of course he’s gonna say that! But….. the theory that they’ve shared sounds fairly good. Though we mustn’t get carried away, as everything can be made to sound good in theory. The truth is, we won’t know for sure until the rubber meets the road!
When the Rubber Meets the Road
To their credit, HP recently published performance numbers showing in the region of ~900K IOPs at sub-milisecond latency. Cool, but that was yesterday….. before dedupe! Now we need good honest performance numbers with dedupe turned on!
But…. if they can get close to that kind of performance on a system sporting deduplication on top of the recently announced 1.92TB cMLC flash drives, well…… they’ll have killer AFA platform in their hand.
If it can’t perform to that kind of standard, well….. they’ll have a decent AFA platform, but nothing special.
Now then, no discussion on dedupe would be complete if we didn’t talk about granularity (dedupe block size).
3PAR will be deduping at a block size of 16KB.
And if you know your 3PAR architecture, that won’t come as a surprise. 16KB seems to be the magic number in the 3PAR architecture.
Seems a tad on the big side right?
I mean, generally speaking, smaller is usually synonymous with better when it comes to dedupe block size. However, the reality is – like just about everything in the tech world – it’s a trade-off.
Sure, smaller block sizes typically yield better dedupe ratios. But at the cost of more metadata. And more metadata makes the solution much harder to scale. So a balance is needed.
Apparently HP tested different block sizes, and, for example, found that deduping at a blocksize of 4KB got them something like a 5% improvement compared to 16KB. But…. at the cost of a lot more metadata.
However, based on the fact that (according to HP) somewhere in the region of 80-90% of I/Os on 3PAR arrays in the field are over 20KB, then 16KB looks like it might be a good number – yielding decent dedupe at pretty massive scale. Apparently the deduplicating 7450 will support 460TB of raw capacity. And that’s pretty monstrous when we consider most of the competition struggle to achieve escape velocity from fairly humble double digit raw capacity numbers.
So a 16KB fixed dedupe block size is what will ship.
So….. to summarise.
What customers will get – according to Power Point and coffee conversations with the engineers – is an enterprise class all-flash-array, that does hardware assisted inline deduplication at a capacity that most of the competition can only dream of. And if the dedupe comes in at 4:1, customers should be looking at around $2/usable-TB and a maximum usable capacity of 1.4PB! Not too shabby.
Fair play. But what’s missing?
Well…. obviously the lack of a VSA is a sore sore topic! There’s still no flash cache, no real file implementation, and there’s no compression. We could probably do with some mature tools for dedupe estimation. Oh and like we said, we don’t have any hard performance numbers with the dedupe turned on! Aside from that though, on the deduplicaton front, it’s pretty hard to find fault.
If it works in the real world like it does on paper – performance, reliability and cost – then it’s look out Pure and everyone else!
Thoughts and comments welcome…………………
Check out my Linux and Storage training courses over at Pluralsight. I seriously think they’re seriously awesome! Seriously! 😀