So a ton of buzz in the industry around flash storage and in particular all-flash arrays. Warranted buzz in my opinion, as I think a lot of all-flash are about to start chomping their way through the infrastructure lunch that’s traditionally only been on the menu for big iron storage arrays like Symmetrix and even NetApp.
But that’s a topic that’s been discussed over and over again pretty much everywhere. I’ve got a technical question that’s interesting me at the moment……
I’ve had a few discussions with a few all-flash vendors. And to be honest, a lot of them are doing a lot of things the same way. Not everything. But a lot of things. Anyway….. one thing that some are doing differently is garbage collection. Oh yeah! That fascinating subject of flash garbage collection that’s guaranteed to be at the top of your list of things to tell the next guy or girl you’re trying to impress :-S But seriously, a pretty interesting topic. And one that, if you believe some vendors, makes a heck of a difference…… in system performance!
It goes a bit like this…….
System Level Garbage Collection
In the past I was brain-washed into thinking was the one-true-way of doing garbage collection. Well…. probably not brain washed, probably just that I couldn’t be bothered thinking too much about the other ways of doing it. Anyway….
With system level garbage collection, you basically turn off any native garbage collection that the SSD flash drives are capable of, and do it all in the arrays firmware. This requires close collaboration with the SSD vendor so that you get a firmware version on the drives that allows you to do things like disable garbage collection. It then has the potential benefit that the system (the all-flash array) performs garbage collection in a way that’s optimised for how the array works and lays down data on the flash drives. But on the downside it has the potential to contend with user workload for CPU, memory, bandwidth etc…
I think one of the reasons I took this as the de facto and optimal modus operandi for flash in a storage array was because back in the day, when the traditional array vendors were hurriedly shoe-horning flash into their aging-20-year-old-architecures, they pretty much had to do a load of custom with the flash drives to make them work in a system that was designed entirely around spinning disk. BTW How ugly do those solutions look these days!
Basically if the vendors had just taken an SLC flash drive and lashed it into one of their aging systems, it could probably have chewed its way through the entire internal I/O capability of the array – starving every other disk drive in the array of resources….. After all, on some legacy arrays, a single flash drive had the potential to chew through the entire bandwidth and IOPS of the back-end loop/bus/stack it was on.
Oh and of course they had to make changes in the array-level code so that the way the array wrote data to its internal drives didn’t wear out the flash cells before you could say “Enterprise Flash Drive (EFD)”.
Anyway, until today I just took it as a given that the best way to do garbage collection was at the system/array level.
Drive Level Garbage Collection
As with most things in technology – there is another way!
So apparently, as SSD/flash drives have matured over time, so has the firmware and smarts they ship with. Part of those smarts is the garbage collection.
The idea of the drive doing the garbage collection is about as far away from rocket science as it gets (though I have to admit, from my uninformed position, rocket science seems like a pretty simple upward thrust vs the pull of gravity equation, but what would I know). Anyway, the drives already have smarts that take care of garbage collection, and outsourcing the garbage collection to the drive follows the popular technology model of offloading tasks to components as far down the stack as possible – think VMware VAAI and MS ODX etc….
Also, disk drives these days are mini computers in their own right – they’ve got storage, a controller with a processor running sotware/firmware, and some form of network interface. So why not offload garbage collection down the stack?
Sonds simple right? And I do see its merits.
But….. doesn’t it put you at the mercy of the drive vendor and the quality of their firmware? And isn’t their firmware targeted at the consumer market where load and demands are different from enterprise use cases? What happens when they report a bug that could result in data loss and tell you to upgrade to the latest drive firmware ASAP, but XYZ vendor hasn’t qualified it yet? And don’t give me the “At XYZ vendor we do rigorous testing before we ship anything to customers in our arrays”. Bugs still get through that kind of testing….
Does Any of This Matter
OK but should anyone care?
Well…….. according to the folks at EMC XtremIO, everyone thinking of buying an all-flash storage array should care.
Well…… according to the folks at EMC XtremIO, some of their competition – who do system level garbage collection – see significant performance drops when servicing production workloads at the same time as performing so called background garbage collection.
Now I can’t verify this. But I know this much. If I were considering buying an all-flash storage array. I’m damned sure I’d be filling it to nearly full and putting it to the sword from an I/O perspective and sitting there waiting to see if performance tanked during a garbage collection run on the array.
Worth keeping in mind if your looking for a new all-flash array.