Ive Seen The Future of SSD Arrays!

I’ve a keen interest in SSD, especially SSD based arrays.  So I was pretty damn excited when at last weeks HP Discover event in Vienna I saw would could well be the future of SSD and SSD arrays, and it’s cool, really cool…….

Setting the Scene

I’m not a fan of taking a legacy array technologies and shoe-horning them full of SSD. Frankenstorage springs to mind!

I am, however, slightly more of a fan of technologies like Violin Memory and Kaminario (to name just a couple). 

However, what I’ve seen at HP Discover has the potential, in my opinion, to make even the likes of todays Violin Memory and Kaminario arrays look legacy, very soon.

Let me take a really quick minute to set the scene…… it wont take long….

In many ways I like Violin Memory.  It’s been designed, almost from the ground up with SSD in mind – definitely not a technology designed for rotating rust and then fudged or butchered for SSD.  Take the lid off one and you’ll see exactly what I mean. 

VMEM-above

It looks like a lot of thought and design effort has gone in to it.  And the technologist inside of me likes that.  However,………. I wonder if it’s too custom and too proprietary to live and thrive in todays market?  Todays market demanding commodity and all that jazz!

On the other hand there is Kaminario.  These guys take standard off-the-shelf Dell blade systems, off-the-shelf Fusion-io cards, layer some clever software on the top and out pops a high performance SSD array.  Ticks the commodity-is-king and software-is-everything checkboxes, but has its drawbacks.  Servicing the Fusion-io cards is clunky and requires you to crack the lid of the blade server open (never good in Tier 1 production data centres).

Kaminario-K2Kaminario-Fusion-io

Doing things properly

So what I saw at HP Discover could well be the future of SSD and SSD arrays, and it goes by the name of SCSI Express.

SCSI Express is a protocol that is currently being independently standardised under part of INCITS T10 by the SOP-PQI Working Group and the SCSI Trade Association, with involvement from SFF Committee and PCI-SIG.  Quite a crew and quite a project, but it was suggested to me that it might be standardised in six or so months.

SCSI Express enables SCSI over PCIe (SOP), and under the hood it is a SCSI initiator talking to a SCSI target over PCIe via PQI. 

SOP basic

PQI stands for PCIe architecture Queueing Interface which is a flexible and extensible transport layer that is very fast and lightweight.  I’m told that it leverages the best from some of the existing proprietary SCSI over PCIe solutions available from companies such as PMC, LSI, Marvell and even HP.  The difference being that PQI and SCSI Express are being developed as open standards rather than being proprietary to the above mentioned companies.  Existing SCSI over PCI protocols such as MPI from PMC are found in silicon in most of the array controllers we see in the world today including EMC, NetApp, Hitachi…

Blah blah blah…. but what does this all mean?

Well, here we have a protocol and interface that is being designed especially for high speed SSD, not spinning rust!

Might not sound much, but when you look at some of the legacy architectures that we are currently bolting SSD drives to (it’s not uncommon to stick an SSD capable of 40,000 IOPs in an array with a backend that can support only a fraction of that) it starts to bring this into perspective.  Todays SSD drives are severely hamstrung by the legacy architectures we bolt them to and it verges on a crime to do such.

SCSI Express will release the shackles.  It will allow you to take a hot-plug 2.5-inch form factor SSD and install it into a 2.5-inch form factor drive bay on the front of an industry standard server, just like we do with hot-plug drives today.  The major difference being that the SSD won’t be hamstrung by SAS or SATA.  The drive will mate with a specially designed, but industry standard, interface that will talk a specially designed, but again industry standard, protocol (the protocol enhances the SCSI command set for SSD) with standard drivers that will ship with future versions of major Operating Systems like Windows, Linux and ESXi.

On the topic of hot-pluggable, I’m led to believe that this is not very robust on PCIe as we know it today.  However, it’s doable and the guys at HP Discover told me that this should be standardised and available by the time this is productised (somewhere around the end of 2012.

The Future of SSD Arrays

So, again, in my humble opinion, the future of SSD arrays is unlikely to look like a VMAX, VNX or even VSP……  Nor is it going to look like a Violin Memory array. 

In my opinion it is going to look like an HP Proliant, Cisco UCS…. name your industry standard off the shelf x86 server, crammed full of industry standard form-factor hot-pluggable SSD drives, running SCSI over PCIe with all of the smarts and clevers in software on top o VMware (sorry couldn’t resist throwing the VMware comment in). 

Seriously though, I can see it.  While I love SSD and some of the SSD arrays out there, I’ve has always felt like there is something not quite right about them.  I think SCSI Express/SOP is the missing magic!

And when these products ship and change the world, I plan on putting my feet up and retiring, as this will clearly solve every problem that the storage world has or ever will have!

Prototype at HP Discover

The concept box on display at Discover is an early prototype, but was an HP ProLiant server with an early Fusion-io 2.5-inch SSD drive connected to the PCIe bus (I know its not really a bus) via an SFF 8639 backplane connector (PCIe 12Gbps 6 lane).  However, this is also doable over PCIe cable implementations

SoP servers pic

SoP interface card

In the prototype unit at HP Discover, the SCSI Express drives connected to the PCIe bus and bypassed the HP RAID controller similar to the picture below. 

SOP RAID bypass

This kind of implementation leaves a at least a couple of options when it comes to RAID - 

  1. Have RAID implemented higher up the stack and utilising CPU cycles (the dreaded software RAID).
  2. develop newer RAID controllers with SCSI Express and SCSI over PCIe in mind

RAID………….. now there’s another technology that could do with being brought in to the 21st Century!  but that’s another conversation.

The plug for mating with the server is SFF 8639 and the current board connects to the server via the SAS cables.

Footnote: To SCSI or not to SCSI

Interestingly, the somewhat competing standard of NVM Express is looking to do its magic without SCSI.  A bit bolder, but I have to wonder how much harder?

Half of me would love to get rid of SCSI and the legacy that it brings.  But then again three quarters of me would have liked to see Ethernet replaced with something like Infiniband.  Ethernet clearly isn’t going anywhere and I my head tells me SCSI isn’t either.  Too deeply entrenched.

On the positive side though, SCSI is battle hardened and well understood.

Footnote: Competing Standard NVM Express and EMC

No storage futures story is complete without mentioning EMC. Sorry HP and the rest of the storage industry but I like to be honest.

Interestingly EMC are not on the list of companies behind SCSI Express.  But they are on the list of those behind NVMe!

As we know, EMC are one of the biggest families in the storage Mafia, and they have significant influence over VMware, one of the biggest families in the technology Mafia.  Now, in mind (where 2+2=33) having the daddy of the storage industry behind NVMe, coupled with the interesting noises that VMware has been making about the future of storage, I can’t stop my mind running wild with what they might be up to…. I would it even put it beyond then to be planning the death of SCSI!

Anyway, enough for now.  Thoughts and comments mandatory Winking smile

22 comments for “Ive Seen The Future of SSD Arrays!

  1. December 3, 2011 at 11:58 pm

    One thing I don't follow is the hating on RAID that espoused here and elsewhere…  Most of the proposals I see for it's replacement is either block or file scattering out around the spindles/ssds… which amounts to a capacity multiplier based on the number of copies desired.
    This does have a HUGE win in that disk rebuilds don't have the RAID5 penalty (assuming the blocks are random-scattered rather than done statically in a classic RAID-group.  But it doesn't answer RAID5's big win: redundancy without a massive capacity overhead.
    -Jason

  2. Chris
    December 5, 2011 at 4:41 pm

    Nigel,     Great right up.  One correction:  The 2.5" drive in the demo was provided by Fusion-io, not STEC.

  3. HansDeLeenheer
    December 5, 2011 at 6:19 pm

     

    Nigel,
    Great review. There are a lot of opportunities now that solve some of our biggest storage challenges (high IOPS being the biggest) with keeping those IOps out of the array. Think about FusionIO but also XSIGO and here with SCSI over PCIe. If you think plain ProLiant servers could be the ultimate solution, how would we handle the shared part of storage then? Will the shared array become obsolete? Will we add shared storage as a component of the server and have VSA's on top? 
    Will we still have a need for shared storage? If you think of high availability that is resolved at the Hypervisor that doesnt need shared storage anymore, why would we use it anyway?

  4. December 5, 2011 at 8:20 pm

    Hans:
    My reading between the lines indicates that the HBA in these ProLiant servers would be more akin to a port on an array, software handles turning the FusionIO cards/SSD drives into 'luns', and the SAN array magic is coded into x86, and probably running under a stripped down version of Linux… so these ProLiant servers would be the array.
    The downside would be the number of switch ports dedicated to storage would go up (as opposed to ports dedicated to servers).
    -Jason

  5. December 5, 2011 at 9:18 pm

    Hi Jason.

    Thanks for commenting. My issue is with legacy implementations of RAID. They dont scale with todays demands, workloads, drive sizes or architectures. Some of the more modern implementations go a long way. However, Ive scratched and scratched my head for a better way to do it, and unnily anough, my pea sized brain hasn’t come up with anything ;-)

    As for your second comment re the ProLiant being the array. Thats what Im expecting to see going forward, commodity x86 servers being SSD arrays. Things are already moving in that direction. I wish I had more detail on the implementation, but it’s pretty cool stuff.

    Chris. Thanks for the correction. Ive updated the post accordingly.

    Hans.

    A couple of points.

    First. Its my opinion that HP need to do something to take the game back to Cisco after Cisco have taken the game to HP with UCS. My personal opinion for a while has been that a Xsigo or Xsigo type solution would be an awsome technology if backed or bought by HP. Add in to the mix SCSI Express and SOP and you have a cracking solution.

    Second. My thoughts around the ProLiant, or x86 server, being the end game for SSD arrays is for them to be shared arrays. If we look closely most shared storage arrays these days are based on x86 architectures. HP’s line up borrow a lot from the ProLiant range and will borrow mroe and more going forward. Even VMX is a loosely coupled group of 8 x86 server engines, of sorts.

    Great comments and very interesting times.

  6. December 5, 2011 at 9:55 pm

    Nigel: Well, I've heard that IBM's XVI solution implements what I referred to as 'random scatter-multiple copies' in order to drop the rebuild time to a fraction of what was happening… taking a page from Hadoop's playbook there… Which is great if you can handle doubling or tripling your capacity requirements.
    My 'pea sized brain' has tried to work through a hybrid of that and RAID5, but I think the algorithmic complexity would really suck in the real world… that and you'd almost have to work at the file rather than block level to make it work (which would suck too).
    It's surprising to me that FusionIO hasn't tried to do this game themselves… commodity server, full of modified SSD's, park 2xdual port 16gbs HBA's on the back, and boom, you've got a rocketship of an array…  heck, most HBA's can implement SCSI target mode… the rest is just some simple plumbing.  Hardest part is probably finding a MotherBoard that has sufficient 8-lane PCIexpress slots.
    Heck, if you've implemented a nice 10gig network, you could even do it over iSCSI, and do 'top of rack' storage rather than bothering with switches… 

  7. December 6, 2011 at 7:56 am

     

    Hi Nigel, thank you for the post on SCSI Express. It sounds very promising. Still i see that we have a long way to go since proper addressing of SSD or SCSI Express devices is also a problem of the Filesystems and how they act with the drivers and the underlying devices.
    Also i doubt if SW RAID or common RAID (in a new SCSI PCIe Controller) will help us to solve the "rebuilding time"-issue. 
    -Roger

  8. December 6, 2011 at 12:32 pm

    @Nigel
    My last point was based on that FusionIO demo last week where they got 1,5mln IOPS out of one single server (DL580 + 10 x 1TB FusionIO). If we can fit this size of data with that amount of IOPS in one box, why would we still use shared storage/resources? Why not just move back to local storage and leave the failover mechanisms to the hypervisor? The solutions you are proposing (legacy servers as array members) are just the next step for things that already exist such as Lefthand and Equallogic.

  9. Dave
    December 6, 2011 at 6:04 pm

    Hi Nigel, you really should check out Whiptail Technologies, they offer SSD arrays where all the magic is in the SW stack and utilising standardised HW.

  10. December 6, 2011 at 9:20 pm

    Interesting blog. Thanks for thinking of us. We at Kaminario agree that in the long run, interface standards designed for SSD performance and latency will benefit everyone. I'd like to point out a few things, however. First, standards and the products that use them take time to mature, usually beyond the first version, which is often released later than projected. Second, an enterprise SSD architecture is just as much about scalability, availability, manageability, and data protection as it is about performance and latency. The Kaminario K2 provides a seasoned , SAN based architecture designed from the ground up for SSD today, with all the scalability, availability, manageability, and flexibility enterprises demand, including the ability to mix DRAM and Flash in the same box to meet specific application requirements. Finally, it's easy to swap out the K2's storage and management modules without having to crack open anything. We all agree with Nigel but in the meantime try the K2 for a very workable solution today!

  11. December 6, 2011 at 9:37 pm

    Jason,

    Yes I know the IBM XIV technology and its pseudo random data layout and scattering for RAID1 like protection and very fast and eficient rebuilds, and Im a fan. As for the server full of Fusion-io cards, I’ve heard of skunkworks projects at various vendors doing just that. I think that motherboards with sufficent PCIe capabilities may start to appear at around the same time as SCSI Express is ratified as a standard. The two could come of age at the same to time to create a perfect storm if you will. Just thinking outloud of course.

    Roger,

    I was assured while at Discover that the OS and driver side of things was being worked on and should not be a problem, although I see where your coming from. I guess we’ll have to wait and see, but its certainly moving in the right direction.

    Hans,

    I just think there is still benefit in the shared storage model. I think the DAS approach has its uses, but I think shared arrays do, even shared SSD arrays. Take Exchange 2010 for example, the DAS model works in some cases but absolutely doesnt in others. But then again Im a storage guy so I would like shared arrays ;-)

    Dave,

    Im well aware of WhipTail and like what they do.

    Gareth,

    I’m a fan of Kaminario and the K2 and think it is a great solution for today and tomorrow. But I think things will be a lot different very soon. Hopefully you guys are on top of things like SCSI Express and NVM Express.

  12. December 7, 2011 at 2:37 pm

    I was tweeting about how storage breaks the economics of virtualization and private cloud and @hansdeleenheer responded to me with a link to this blog. 
    Incumbent storage technology is ripe for a major disruption and all indications are that we are about to witnes it.
    Traditional "cloud" vendors, such as google, amazon, salesforce and so on dont use EMC or Netapp storage for a good reason – lack of scale-out capability, management cost, and lack of modular capability (not to mention the upfront cost). Cloud is all about application availability and scalability and if your back end architecture does not provide that level of flexibility then your cloud is dead in the water. 
    Now look at a Cisco UCS system (to name one) and it provides 32 TB of internal storage. With the right connectivity across nodes and the right software stack to enable enterprise storage capability on commodity hardware customers would have the perfect highly available and highly scalable commodity based hardware platform for highly virtualized private cloud deployments. 
    Storage will benefit from the level of flexibility, scalability and mobility that was brought to servers by hypervisors. At Sanbolic we firmly believe that this is the case and Melio enables customers not only a hypervisor and server agnostic approach but also the ability to aggregare internal storage independent of its media (SSD, HHD and Fusion IO like technology) 

  13. December 7, 2011 at 11:49 pm

    Fascinating stuff, Nigel.  You're definitely spot on for where the storage industry needs to go regarding implementing solid state.  I just can't help but think that if the interface bottleneck is removed, what is next?  Will storage controller firmware be able to keep up or with that be the next bottleneck?  Will these new technologies allow connectivity over moderate distances and used for scale-out architectures to avoid the latency penalties with Ethernet?

  14. Mike James
    December 14, 2011 at 4:13 pm

    Great post, great technical detail and analysis.  One correction:
    "(PCIe 12Gbps 6 lane)"
    The SFF-8639 connector can support either Multilane SAS (12 Gb/s, 4 lanes) OR 4 lanes of PCIe Gen3 (8 Gb/s, 4 lanes).  Two of the lanes can be either SAS or PCIe, so you are correct that there are 6 lanes.
    In the configuration you saw at Discover, it was PCIe Gen3 x4

  15. Donny Parrott
    December 20, 2011 at 7:37 pm

    Nutanix has beaten HP to the punch. Bye, bye storage arrays…

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You can add images to your comment by clicking here.