Hitachi Virtual Storage Platform – VSP

If you haven’t heard yet, HDS have just announced their next generation enterprise storage array.

WARNING! This article is a technical deep dive. It’s about 3.5K words and is not recommended for light-weights.  If you want to know what its all about from technical perspective I suggest  you grab a hot brew and read on.  If a technical deep dive is not what you want, then scurry on somewhere else.

Still here?  Nice one!  I think you’ll find this the best source of technical info on the VSP available anywhere.

So lets start with a picture.  This is what a VSP looks like -

VSP-Front

Pretty ugly right!  However, my mother always told me it’s more important for something to be beautiful on the inside.  So don’t let the ugly front doors put you off your brew ;-)

 

The Name

For those who may not know, HP also sell the VSP via an OEM agreement with Hitachi Ltd of Japan.  So the product has two names, an HDS name and an HP name – but its the same product under the hood.

The HDS name – I suppose with the current trend of everything needing a V at the beginning of its name, or the need to include the words virtual or cloud, it’s not surprise that HDS have called it the Virtual Storage Platform, or VSP for short.

 

A couple of points on the HDS name - 

  1. It’s a whole lot simpler than USP V, or is that USPV or USP-V or USPv, or may be even USP-v…..
  2. I suppose its better than CSP, where C would be for Cloud ;-)

Internal Code Names:  Internally at HDS the project was known as Victoria, and prior to that as project Mother Goose (guess it's been that long in coming that it needed two internal project names).  The Mother Goose name I'm not sure about, but the Victoria I have a fair idea…..  The guys at HDS tell me the obvious, they chose the name Victoria as it starts with a V, and that it has it’s beginnings in the Latin for Victory.  However, I’m fairly certain the real reason is that somebody in the team probably had a childhood crush on a girl named Victoria and they never quite got over her ;-)

The HP name – HP have previously named the product XP, with previous versions including XP1024, XP12000 etc.  So one may have guessed that HP might have named it something like – XPV, vXP, XP2048..  But No.  HP are marketing the product as the P9500, bringing it in line with the current HP naming standards – MSA is now P2000, Lefthand is P4000, XP is now P9000 series.

Hitachi (日立)Factory Name:  The internal Hitachi factory names for previous generations of this family of product were –

  • Lightning 9900 = RAID400
  • Lightning 9980V = RAID450
  • USP = RAID500
  • USP V = RAID600

The VSP follows suit and is internally referred to as the 日立 RAID700.  If you view the results of a SCSI Inquiry, or SCSI codepage 0×83, you will see this as the product ID.

High Level Overview

The VSP is a Fibre Channel only (oh, and FICON) enterprise storage array that scales from a single bay up to a maximum of 6 bays.  The two central bays house what Hitachi is calling Control Chassis (containing processors, memory, front and backend connectivity…) as well as drive enclosures.  The remaining 4 bays can contain only drive enclosures.  It can scale from zero (0) drives to a maximum of 2,048 drives, making it the biggest (Update: 29/09/2010 – biggeest from a drive slot perspective) storage array from Hitachi, but still not as big as some of the competition – but size is not everything to all people ;-)

VSP 6 frame diagram

NOTE:  There is no minimum entry model like with previous versions.  Also the smallest config can scale to become the largest with no forklift upgrade required. This is in stark contrast to the previous generation where a USP VM could not be upgraded to become a USP V.

Now to the REAL technical stuff…….

Sub-LUN Tiering

Today’s Holy Grail for all decent storage arrays is the ability to tier at the sub-LUN level.  That is, the ability to have the most active parts of a LUN reside on fast (most expensive)  media, and the less frequently accessed parts reside on slow (cheaper) media.  The VSP has this and it calls it Hitachi Dynamic Tiering (HDT).

Sparing you a lecture on Sub-LUN tiering I’ll just pick the interesting technical points for the VSP implementation -

  1. All drives can be placed into a single tiering pool.  You just throw your SSD, SAS and SATA into an SLMP (Single Large Melting Pot – I made that up).  Not what I was expecting given existing HDP Pool best practices, but I suppose it works from a simplicity perspective.
  2. New data is staged to the highest tier available, and then, as it becomes less active it is migrated down the tiers.  Most folks I speak to seem to initially think it will be best to stage data at the lowest tier and then promote it up through the tiers as it is used.  I guess staging to the highest tier means that you will always be fully utilising your expensive Tier 0.  Time will tell if this approach works.
  3. The Sub-LUN extent size (which Hitachi calls a Page) is the infamous chubby chunk… the 42MB HDP Page.  Basically the VSP will move data up and down the tiers in units of 42MB (contiguous space).  If you have 1MB of a file that is hot, the VSP will migrate that 1MB, plus the remaining 41MB of that Page up a tier.  The same for demotions.
  4. On the policy side of things, you can set gathering windows, exclusion windows, and the movement cycle is between 1-24 hours.

So a tick in the Sub-LUN tiering box for the VSP.  All that remains to be seen is whether it does what it says on the tin.

SAS backend and 2.5 inch drives

The writing has been on the wall for a while now, Fibre Channel as a drive interface is on its way out.  Don’t panic though, it won’t happen overnight, but by the same token, don’t be blind to the facts, it is happening.  The new protocol, interface and backend architecture is SAS (Serial Attached SCSI). 

NOTE:  Enterprise class SSD drives come with SAS interfaces, and you can plug Enterprise class SATA II drives on to SAS backends.

As far as I’m aware, the VSP is the first of the enterprise arrays to adopt a SAS backend, but it’s unlikely to be long before the rest follow suit.  So is there any value in the SAS backend, or is it just a tick in the box to be future proof?

There is an argument that the 6Gbps full duplex SAS backend will be able to sustain more IOPs and throughput to SSD drives, certainly more than a 4Gbps FC-AL (or switched FC-AL) backend.  However, don’t expect to be able to drive the ~45,000 IOPs that the spec sheet of an STEC Zeus IOPS claims.  Also note, that the spec sheet I’m reading from the STEC website only shows 3Gbps SAS interface, making it slower than 4Gbps FC at the time of writing :-S  However, when a 6Gbps interface comes along, the plumbing is already there in the VSP.

As far as I’m aware, the only manufacturer supplying SSD drives for the VSP is STEC Inc and I believe the drives to be the same ZEUS IOPS drives (200GB and 400GB varieties) that the competition use.

The pictures below are close-ups of a drive enclosure.  The picture on the left shows the enclosure with the fans (on hinges) opened, the picture on the right shows with the fans closed into position -

VSP-Drive-Enc-Open VSP-Drive-Enc-Closed

2.5 inch drives.  For me, 2.5 inch drives are the future of disk drives, and allow for greater capacity in the same footprint.  For example, a single frame VSP with a Control Chassis 0 can have 384 drives, each of which can be 2TB.  That’s a lot of capacity in a relatively small footprint.

It’s also strange that once you’ve seen one system with 2.5 inch drives, all systems that still deploy 3.5 inch drives look clunky and dated.  Bring on the 1cm drives!  J/K

UPDATE 29/09/2010 : Readers should note that capacities of 2.5 inch drives currently lag behind those of 3.5 inch drives. For latest drive info see relevant drive manufacturer websites.

Front End Director (FED) Design

So the front end design of the VSP is significantly different to the design of previous generations and one of the more technically interesting changes.

At a high level, Hitachi have taken the wide-striping concept that is now commonplace in backend architectures, and are bringing it to the front end.

Basically, the FEDs and BEDs (Front End Directors and Back End Directors) have custom I/O routing ASICs that are specialised for I/O traffic management – Hitachi are calling these data accelerator ASICs.  These ASICs have an affinity with ports.  However, the more general purpose CPUs are no longer locked to particular ports, they have been moved to a processor complex called the Virtual Storage Director (VSD) where they are pooled and can have their resources dynamically assigned and un-assigned from any front or back end port.

Example:  In legacy or more traditional architectures (I’m going to keep this high level), Processor 1 would be tied to Port 1 and Processor 2 would be tied to Port 2.  If Port 1 was running like the clappers and maxing out its Processor, while Port 2 and Processor 2 were sitting idle, there would be no way to assign the resources of Processor 2 to Port 1.  Now, with the VSP, the CPUs are pooled and can be assigned or unassigned from any Port, so a port that is running flat out can have the resources of multiple CPUs assigned to it.

Of course the proof is in the pudding.  BUT, this could potentially do to the front end, what drive pooling and wide-striping have done to the backend.  Look at a heat map of a traditional backend where specific spindles are assigned to specific LUNs and applications, and compare it with a heat map of a system that employs drive pooling and wide-striping.  The difference is amazing.

This could also have potential heat and power benefits, turning off processors when they are not required – although I’m speculating now.

ASIC vs Commodity: The general purpose CPUs, where the microcode, including copy services, replication services, HDP etc execute are Intel Core Duo Xeon quad-core processors.  The ASICs on the FEDs and BEDs are Hitachi designed.  Hitachi, like 3PAR, obviously feel there is still value in custom silicon.  This is in stark contrast to the likes of EMC VMAX and IBM XIV which have taken the commodity route, driving value out of software.

The custom ASICs take care of latency sensitive data such as user data, whereas less latency sensitive data, such as asynchronous replication traffic, is processed by the general purpose CPUs on the VSDs.

While on the topic of the front end, I should point out that this is a FC and FICON only array.  No iSCSI (no plans for it) and no FCoE.  However, FCoE is apparently not far off, may be the end of Q4 or early Q1 2011.

Control Chassis (Logic Boxes)

As mentioned earlier, and seen in the images below, the centre two cabinets, in a six frame VSP, house the Control Chassis’ – Control Chassis 0 and Control Chassis 1.  The picture below is of the front and rear of a frame containing a Control Chassis.  The Control Chassis is in the bottom half on both pictures, the top half contains the drive enclosures hidden by fans.

VSP front with doors open VSP rear with doors open

And a close-up shot of each – note the slapdash fibre cabling – tut tut ;-)

Control Chassis Front Control Chassis Rear

As will be show in the diagrams below. the Control Chassis contain the following -

At the front there are 4 x Virtual Storage Directors (VSD), and 8 x Data Cache Directors.

VSP Visio of front

At the rear there are 8 x FEDs, 4 x BEDs, and 8 x what Hitachi are calling Grid Switches, or GSW if you like.

VSP Visio of rear

Data cache is backed up to onboard flash drives, which I believe will mean no requirement for large batteries etc. Pretty cool, assuming I’m correct.  It has been confirmed that the previous statement of not requiring large batteries to hold cache in a power loss scenario is correct.

Each BED has 8 x 6Gbps SAS paths.  That adds up to 32 backend SAS links per Control Chassis, and 64 x 6Gbps links in a fully configured unit.  SAS runs at full duplex. 

SAS backend

Control Memory (CM) is located on the Virtual Storage Directors alongside the general purpose CPUs making it very quickly accessible to the CPUs, like an L2 cache.  As with previous architectures, Control Memory stores the usual metadata and system state such as LDEV mappings, DIF tables, run tables etc.

Grid Switches – The Grid Switch boards provide the, 4-lane PCIe gen. 1, paths that cross connect all of the other boards. Each Control Chassis can have either two or four GSW boards. Each GSW board has 24 unidirectional ports, each port having a send and receive path, with each operating at 1024MB/s.

Hitachi refers to the Control Chassis as being tightly coupled, and the interconnect between the two is PCIe gen 1 over copper.  The Control Chassi are able to communicate by mapping Hi-Star over PCIe.

 

Microcode and Microcode Updates (interesting)

As you would hope and expect, a large part of the VSP microcode is inherited from the USP V, which in turn inherited it from the USP, which inherited it from the Lightning 9980V, which inherited it from ……  The code from the USP V was recompiled to run natively on, and exploit the advantages of the new Intel processors.  All new features to the VSP, such as Dynamic Tiering, were developed to run natively on the VSP hardware.

Interestingly, the microcode now runs in fewer places – basically the Virtual Storage Directors.  This has the effect of hugely simplifying the microcode update process.  On that topic, and it’s not very often that I get excited about the microcode update process (in fact I’ve never gotten excited about it before), but this time I’m excited!  Stay with me on this…..

If we remember back to when we talked about the general purpose CPUs (where the ucode runs), we mentioned that the processors are not tied to front end ports.  Well as it turns out that this has a huge impact on microcode updates.  Basically, because the processors that run the microcode are not tied to ports, when you update the microcode, no ports ever have to go offline!  Take a second to let that settle…. once again, front end ports do not go offline during microcode updates!  There are other architectures out there that require paths to go offline during code upgrades, these are an outage waiting to happen.  I’m not saying there wont be a reduction in performance, but this is still HUGE!

Real world experience:  Anybody who works in large environments knows that microcode updates can cause issues.  Path failover and multi-pathing in general are often the cause.  Common examples include servers that incorrectly have both paths cabled to the same SAN fabric, not noticing that a path is already in a failed state on a server, applications that don’t work well with path-failover…  So to have a solution where paths are not taken offline during a microcode update is an absolute god-send for large organisations – IMO.

Data Centre Friendly

This might seem strange, but for products that are designed to spend their entire lives in data centres, enterprise storage arrays are often like a wart on the nose of a well designed Data Centre.

In the past, it has almost been a pre-req for enterprise storage arrays have features like bespoke custom sized cabinets and blowing hot air in whatever direction they felt like – sometimes towards the ceiling ;-)

Well finally VSP has made some steps in the right direction -

  1. It comes in a standard sized 42u 19 inch rack
  2. Has a hot and cold isle airflow design
  3. Can take power feeds from above or below – you can now install one in your garage without installing a raised floor ;-)

VSP-rack-sizes

Not quite at the point where customers can install one in their own pre-provisioned racks, but it’s moving in the right direction.

Hitachi are also claiming ~40% power reduction, which, if true, will go down well with every Data Centre I know.

But it Looks So Ugly

Let’s face it, most of us have a shallow side to us and like stuff that looks good. 

But before I get lambasted over this, let me state that I know that appearances are not that important when it comes to kit that spends it’s life in a data centre and never sees the light of day.  However, I think there are a couple of areas where it does matter (but way way less than how well it works etc…).  Those couple of areas are -

  1. First impressions count.
  2. It’s about brand.  Hitachi are no doubt billing this as an exciting product and trying to generate interest – at the end of the day they need to ship it in large quantities.  In my opinion, the outward design does not give a good first impression and certainly doesn’t capture the imagination.

VSP vs VMAX

Although HP didn’t actually put a blue neon light on the front door, their marketing guys seem to be getting the message with the photo-shopped image on the HP StorageWorks P9500 below -

HP-P9500

In a day when the competition are sporting kit with dual-power-fed blue neon lights and others with bezels designed by Armani, I was hoping for more from Hitachi.  But its not the end of the world and I think I’ll live, just about.

However, on the inside it certainly looks neater and tidier than previous generations.  This, again, is important as it speak of good engineering.

Management Software

Moving on from something that’s not all that important, to something that absolutely is…management software.

Apologies for mentioning this so far down the post, but I haven’t actually tested the software, and I have never been a fan of Hitachi storage software, so I find it hard to get excited.  Having said that, it looks for once like they might have come good on management software …

A previous poll on this website about how good or bad Hitachi Storage Management software is showed the following results -

33% said it was “Poor”

25% said it was “Average”

14% said it was “Good”

15% said they would “rather pull their teeth out than use it”

8% said it was “excellent”.  These voters no doubt worked for Hitachi ;-)

The Victoria launch is a joint Hardware and Software launch.  Device Manager is now at version 7, and it appears that Hitachi might have been listening to feedback. 

From what I've seen, aesthetically speaking, it's a huge improvement on previous versions.  Like other decent management GUIs it’s based on Adobe Flex technology and it actually works how you would expect a decent management app to work (I’ve seen a demo).  You know, like being able to resize the screen, move columns, re-order columns… 

Let's hope that it also scales and doesn't need "occasional" reboots.

Hitachi are positioning it as a unified (there’s a word that is finding it’s place alongside virtualisation and cloud) management platform across their storage products – VSP, HCP, HNAS, AMS…  Hitachi have also spent a lot of effort on building in SMI-S.

My personal opinion is that the management interface looks a lot better, in the same ball park as the likes of SMC, but not in yet the same league as XIV, 3PAR or UniSphere.

Still Legacy RAID Architecture

As with most things in life, you don’t get everything you hoped for.  And for me, the big thing that is missing is a revamp of the underlying RAID architecture.  With the long wait for the product I was hoping they were working on this.

In today’s world of 2TB and larger drives, an average of around 10^14 Bit Error Rate and therefore risk of Unrecoverable Read Error being something like one in every 12TB, elongated rebuild times, larger capacities to try and background scrub…… I’ll kill the list there so as not to keep you awake at night worrying about your RAID.  Point being, RAID architectures that were written 10, 15, 20+ years ago are showing clear signs of creaking at the seams.

In my personal opinion, distributed parallel RAID architectures that offer ultra fast but ultra low impact rebuilds (or re-protection) are needed.  I personally don’t want to see triple parity RAID or other band-aid solutions.  A redesign is needed.  Technologies like some of those sported by 3PAR, XIV, Xiotech etc need to make their way into enterprise storage arrays.  Rant over.

Misc – VAAI and Encryption

VAAI.  Unfortunately there is no VMware VAAI goodness on day one.  However, this will apparently come in the first code rev planned for either the end of this year or very early next year.

Encryption.  The Back End Directors are capable of encrypting data, internal to the VSP, while at rest, with XTS-AES 256 bit encryption.  Encryption is done in hardware with apparently no overhead, so wont impact your performance.

I’m also interested at the overlap with the AMS line.  VSP will scale as small, or smaller than AMS but with much richer (pun intended) features.  It seems an industry trend for midrange architectures to be reaching in to the enterprise space, whereas this seems the opposite way around.  Hmmmm…

Well, I’ve about worn my keyboard out so will bring this to a close.  Hope it has been useful.

Feel free to leave comments, thoughts, questions… but please, please, please, if you work for a vendor, please disclose this. Thanks!

You can talk to me online about technology on Twitter, I’m @nigelpoulton, or http://twitter.com/nigelpoulton

Disclaimer. I do not work for Hitachi, HDS, HP or anyone involved in the product. I have, however, been contracted in professional services and implementation architect type roles by both HDS and HP in the past. Opinions expressed in this article are my own and do not represent those of any of my employers, past, present, or future.  I have obviously had exposure to the product and people prior to product release, see previous posts for info. I have not received a penny in relation to this product.

68 comments for “Hitachi Virtual Storage Platform – VSP

  1. IvanE
    September 27, 2010 at 12:34 pm

    Good post Nigel, and well timed.

    You’re right in your belief on large cache batteries – they are gone.

    And power consumption looks way better now – 2.5 drives, no batteries, single phase power – all gives you the same power consumption with 2048 drives as USPV had with 1152.

    Note: Ivan works for HDS

  2. September 27, 2010 at 1:19 pm

    Thanks for this very detailed post — very helpful to many of us.

    Any indication of how the technology rolls out and becomes available?

    Thanks –

    Chuck

  3. September 27, 2010 at 1:38 pm

    Ivan, thanks for the clarification re no need for the large batteries at the bottom of the rack.

    Chuck, thanks for reading. That’s not a question I can answer, guess we’ll have to wait and see what comes out of HDS and HP.

  4. Chris
    September 27, 2010 at 1:46 pm

    (hp employee)
    > 8% said it was “excellent”. These voters no doubt worked for Hitachi
    .. or HP.

    The “excellent” scores you get on the hds software is in my case, because I dont use (advanced) commandview software, to map luns etc, but directly connect via rdc, remote desktop connection, on a xp24k, to the “svp” and then go via “install” to lun manager to do “the works”. And that “windows alike” interface is just “excellent”. copying “ctrl-c’ng” luns from 1 xp port and dragging/pasting “ctrl-v” the luns to another xp-port, what more do you want. ;)

  5. September 27, 2010 at 3:02 pm

    That is one beautiful array. Belongs in a datacenter that has no lights in it.

    On a positive note, I like the sub-lun tiering. A weird size to pick, but none-the-less, a great feature.

  6. Doug
    September 27, 2010 at 3:30 pm

    Excellent post. Spades of great information, as usual, Nigel.

  7. September 27, 2010 at 3:38 pm

    Roman, I don't disagree that its a bit ugly ;-) 

    In future I'd appreciate it if you disclosed your emplyer though – I know your parent company has a great policy on being open in situations like this. 

    Thanks for commenting.

  8. September 27, 2010 at 4:31 pm

    Great overview – thanks!

    No dedup?

    Also, the 2TB drives appear to still be 3.5" (based on the specs on HDS site) but your write up indicates that they may be 2.5" can you clarify drive types/sizes?

    Thanks again.

  9. September 27, 2010 at 4:44 pm

    BTW I work for Compellent by way of disclosure…. sorry! :)

  10. IvanE
    September 27, 2010 at 4:49 pm

    >Also, the 2TB drives appear to still be 3.5″ (based on the specs on HDS site) but your >write up indicates that they may be 2.5″ can you clarify drive types/sizes?

    2TB SATA drives are 3.5″ as well as 400GB SSD.

    2.5″ drives are 146, 300, 450, 600GB SAS, 200GB SSD.

  11. vMAX customer
    September 27, 2010 at 7:49 pm

    Great article, landed here from a twitter link. Much tweeting about how Sub-lun tiering is already here for vMAX. That may be true, but we'll have to convert from thick to thin pools to take advantage of it.

  12. September 27, 2010 at 8:29 pm

    vMAX customer,

    1.  Sub-LUN tiering is not here for VMAX yet, at least not on GA code.

    2.  Agreed its tricky to get from thick devices to thin pools (VP).  Trick is to go for thin pools (VP) from the outset so that you are Sub-LUN ready.

  13. September 27, 2010 at 8:37 pm

    @johnddias – Nope no dedupe – how I would love dedupe in an enterprise array!

    BTW I know who you work for ;-)

  14. Pingback: On the Balance
  15. September 28, 2010 at 1:46 am

    Nigel – you are correct. Sincere apologies. I do work for EMC, although that in no way dictates my appreciation of aesthetics or lack there-of within the VSP.
     
    Great post though, although do you have any information on the scheduling behind the tier-moves?
    Cheers.

  16. IvanE
    September 28, 2010 at 3:27 am

    > Much tweeting about how Sub-lun tiering is already here for vMAX

    Can you please point where exactly V-Max customer already can have Sub-lun tiering?

    Thanks.

  17. digduggo
    September 28, 2010 at 6:28 am

    Great post nigel, finally we get the inside word from the recent HDS trips. 
    On the 2.5 inch drives, I understand the power consuption savings per drive are due to them being a 10K drive not 15K Performance stats are claimed to be equivilent due to the smaller area of the platter and therefore shorter seek times…makes sense.
    Bring on HDT!! (and maybe some green stick on LED's with the upgrade microcode CDs to spruce up the flagship!).
    ps – I am doing some work for HDS at present including software installs. Device Manager 7 and the newly re-badged Command Director look like a step in the right direction, planning to upgrade some current HDvM clients asap and get some insight into improvements from the field.

  18. Chris Z
    September 28, 2010 at 8:47 am

    They should have put some Mosaic on the doors, it would have looked more futuristic :-)
    And while sub lun tiering is nice but with their 48MB chunk size compared to 768KB chunk size that EMC provides on the Symmetrix line, hmmm i doubt it's really an advantage, not to mention it took them 1.5 years to come up with when EMC already has it on their mid-range!
    Cheers!

  19. September 28, 2010 at 8:56 am

    @ digduggo – Thanks for reading and glad you found it useful.

  20. September 28, 2010 at 9:04 am

    Chris Z,

    Hi Chris.  A couple of points in response -

    1.  I feel it's in everybodys interest if you disclose that your employer is EMC. It just gives the perspective of where you're coming from. I disclosed any relevant info on my side in the article.  EMC has a good policy on blogging and social media, don't let your side down by trying to hide who you work for.

    2.  While EMC Symmetrix VMAX uses a 768K extent size for Virtual Provisioning, EMC have not announced the extent size that will be used for FAST VP.  I will eat my hat if FAST VP moves data at 768K chunks.  I expect FAST VP will be very good, but I think you will find it does not migrate up and down tiers in 768K chunks.

    3.  If 42MB is large, then what does that make the 1GB extent size that CLARiiON uses?

    4.  And your point about it taking Hitachi 1.5 years since EMC have had it in their midrange (Im unsure about this but will take your word). You must surely know that EMC do not yet have this (Sub-LUN tiering) in their enterprise VMAX box! Surely you know this!?!?

    Love to hear back from you.

    Nigel

  21. IvanE
    September 28, 2010 at 9:04 am

    >not to mention it took them 1.5 years to come up with when EMC already has it on their >mid-range!

    Sub-LUN tiering on EMC mid-range for 1.5 years?

    Wasn’t it released only couple of weeks ago as a part of Q3 midrange launch?

  22. IvanE
    September 28, 2010 at 12:46 pm

    BTW, Nigel, isn’t it time for a new vote? ;)

  23. September 28, 2010 at 1:20 pm

    Lengthy but great post!
    Regarding your statement:
    >In today’s world of 2TB and larger drives, an average of around 10^14 Bit Error Rate
    >and therefore risk of Unrecoverable Read Error being something like one in every
    >12TB, elongated rebuild times, larger capacities to try and background scrub
    This is so true for consumer hard disks such WDC Green series where the Bit Error Rate is at 10^14 but for enterprise grade hard disk such WDC RE3/4 the Bit Error Rate is at 10^15 pushing an error in every something far beyond 12TB. So no issue running a RAID5 with more 6*2TB disks for example :)
    Now you have a good point about the aging RAID types and your wish to see soon technologies from 3PAR and Xiotech to make their way through the enterprise. And maybe sometimes soon we can see redundancy set at the block level instead of an entire metavolume/LUN/Pool…
    Cheers,
    Didier

  24. September 28, 2010 at 2:15 pm

    PrioNet,

    Thanks for pointing that out. You are correct. However, and in my opinion this is a big however…… these are just the drive manufacturer published specs. I appreciate that the drive manufacturers may say that these are even conservative specs. However, there is a body of evidence that shows that the risks of an URE can be significant despite the published specs. Drives from the same bach may be more prone to URE than others. Once one URE occurs the chances of another on the drive significantly increase (the increased risk is also often in very close aerial proximity to the first.

    It’s a great debate but in my opinion proves that we need different protection and recovery mechanisms as drives increase in size. Apparently Claus Mikkelsen from HDS was yesterday stating that in a few years we will have 50TB drives. No talk of how we protect them though :-S

    Nigel

  25. Roberto (HDS)
    September 28, 2010 at 5:17 pm

    There were more comments on the lack of blue lights than on what the system is capable of.  Beauty is in eye of the beholder and we had no plans to be on some TV show, no need for additional power wasting, Wal-Mart style, blue lights (or other colors).
    The 42MB page size has proved to be functional and not wasteful at all with the current Dynamic Provisioning (that was 1.5-2.5 years ahead of some three letters word company….just for the record); listen to the customer comments. It will not be a problem now with the new version of dynamic tiering.  No one mentioned all the other factors that make up the new system…like size, scalability, PERFORMANCE, ability to virtualize…all conveniently forgotten, isn't it?  One more thing, all that was announced is available….

  26. September 29, 2010 at 1:03 am

    Nigel,
    Nice post and covers some of the highlights. It hasn’t scratched the surface yet of what this box is able to do. There are some additional goodies.
    FCOE support, changed cache management (very, very fast in comparison with previous generations), one global cache so no shared memory anymore as well as automatic destaging to internal SSD on power down, automatic or manual resource assignment to CPU’s/cores. Doubling the backend speed of the USP-V and lots of more goodies.
    Secondly your statement of FE port going offline during MC upgrade is incorrect. This depends on the load of the MP. If the load of the MP on the USP(-V) is at or below 50% the IO will be handled my it’s neighbour MP during reload of the MC. This is measured before the MP is due to be upgraded and if it is busy the engineer will be notified of this. This has been the way since USP back in 2005 so in that sense nothing has changed. What has changed as you stated that there is a seperation of ports and CPU’s now which allows some more flexibility.
     
    Cheers
    Erwin (HDS employee)

  27. Steve
    September 29, 2010 at 10:01 am

    Can we have more than 3 level-1 SI/TC copies of an LDEV yet (he asks, knowing he'll be disappointed).?

  28. September 29, 2010 at 3:52 pm

    Hey were you at the Launch in Santa Clara? I could've sworn i heard your voice somewhere within the crowd.

  29. September 29, 2010 at 4:10 pm

    Steve,

    Ive asked the question about more than 3 L1 ShadowImage copies. Like you, I expect the answer is No.

    However, when I was in Japan a while ago they were telling that RAID Manager/HORCM configuration was going to be much simpler…. Haven’t seen that though.

  30. September 29, 2010 at 4:12 pm

    Staven Ruby – Hi Steven, long time no hear.  No I wasn't at the launch.  Got my dad staying with me at the moment so couldn't go. 

    Was it any good?

  31. Fred Yang
    September 29, 2010 at 7:05 pm

    Nigel,

    Regarding "In legacy architectures (I’m going to keep this high level), Processor 1 would be tied to Port 1 and Processor 2 would be tied to Port 2.  If Port 1 was running like the clappers and maxing out its Processor, while Port 2 and Processor 2 were sitting idle, there would be no way to assign the resources of Processor 2 to Port 1." in the post.
    That's not true, in HP XP 24K with certain system mode turned on(Sorry, I forgot the SOM id), Port 1 actually can borrow Processor 2 on Port 2 to process the I/O when it's load is above 75%. I'm actually running a few array in these mode and we did see the I/O from one single host push multiple MPs utilization to 80%.

  32. September 29, 2010 at 7:31 pm

    the Launch was good. the after parties were great. :-)
    we'll see how quick they can turn my order around for a couple of them.
    ###
    as far as SI and horcm goes, the whole idea of command devices and clunky horcm files will go away. that's all i can say about it at this point though.

  33. September 29, 2010 at 7:50 pm

    Hi Fred,

    I'm not aware of that mode.  Are you saying that a port can borrow resources from multiple other processors?  OR is this mode restricted to processors on physically adjacent ports etc?

    Knowing the XP24000 fairly intimately including the FED diagrams etc I cant see it being particularly flexible…?

    Happy to stand corrected if you can provide more detail.

    Also, I will reword my article – when I said legacy architectures, I wasnt specifically referring to previous generations of this product, I was intending to include other array vendors and their architectures.  The reason I kept it high level was to cover the vast majority of arrays on the market.

    Appreciate your input.

  34. September 29, 2010 at 8:10 pm

    Hi Erwin,

    Thanks for your input.

    As far as Im aware there is no FCoE today.  That will come ~60-90 days from initial launch.

    I'd be interested in any specifics you can share relative to your statement "changed cache management (very very fast in comparison with previous generations)".  Every vendor says they have the best cache management algorithms etc.  You need to back statements like that up with detail.  Are you referring to caching algorithm improvements or improvements in the physical architecture that place cache in more favourable locations etc..?

    Also, I'm aware of the Non-Stop SCSI microcode upgrade process on USP and USP V.  However, your own guys seemed excited about the possibilities due to the new processor pooling and lack of port<–>processor affinity.  I think its looking a lot more flexible now.

    Nigel

  35. September 29, 2010 at 8:13 pm

    Steven Ruby,

    Yes it was suggested that there were improvements in relation to HORCM etc coming in the future, but at the time we didnt have time to delve into the detail.  Do you have more detail?  Not asking you to share anything NDA, just curious if you have any solid info or if its something that is a little way off..

    Nigel

  36. Fred Yang
    September 29, 2010 at 9:12 pm

    Yes, Nigel, you are correct, this mode(SOM 672) will only allow the processor borrow resource physically adjacent to the ports.

  37. September 29, 2010 at 9:44 pm

    Nigel, email me at my work address. I think you have that one.

  38. September 30, 2010 at 12:33 am

    Hi Nigel,

    Yes FCoE support is not at initial launch. Everything is already there but need the usual qualification and regression  testing. Being a rather new protocol and some openness of implementation specs might cause some trouble when this is not sorted. You know Hitachi, better safe than sorry.

    W.r.t. cache management the architecture has changed as you've written. There is no separation of SM and CM anymore but everything sits in one global cache. This overcomes any bandwidth restriction that SM could possibly have. Also the housekeeping tables are now located and distributed right next to the CPU in L3 cache. You can't get any faster. :-)

    As for the algorithm itself I can't explain that here. its Hitachi confidential as you might guess.

    You are right that we're pretty keen on the MP/port separation as this gives us a lot more options on the software level. My point just was to emphasize on the non-disruptive code load on the current products as well. In that sense it hasn't changed.

    Cheers
    E

  39. October 1, 2010 at 2:20 pm

    "There is no separation of SM and CM anymore but everything sits in one global cache."
    -Erwin Van London (HDS)
    Control Memory (CM) is located on the Virtual Storage Directors alongside the general purpose CPUs making it very quickly accessible to the CPUs, like an L2 cache.  As with previous architectures, Control Memory stores the usual metadata and system state such as LDEV mappings, DIF tables, run tables etc
    - Nigel
    Are these statements not contradicting each other?
    And If the stuff that used to reside in shared memory is going to be now in cache, then you are defying your own theory that its good to keep them separate. 

  40. October 1, 2010 at 8:29 pm

    Hi Biju,

    I think I see the confusion. 

    In previous versions of the technology such as the USP and USP V had dedicated Cache Memory boards (CM) and dedicated Shared Memory Boards (SM).  There are no more dedicated Shared Memory boards, control data which used to sit on the SM boards now sits on DIMMs on the VSD boards.

    User and control data is still separated, but there are no more dedicated SM boards.

    The 8 x Data Cache Directors that I identify in the visio diagram in the Control Chassis section of the article store user data.  By user data I mean aplication data being fetched from or stored to disk.  There is no control data stored here.

    Does this clarify?

  41. Craig (HDS)
    October 1, 2010 at 8:51 pm

    Hi Nigel,
    You said:
    > The Victoria launch is a joint Hardware and Software launch.  Device Manager is
    > now at version 7, and it appears that Hitachi might have been listening to feedback. 
     
    Instead of might, I would say we do listen to feedback and take that feedback seriously as we enhance the software.  For instance, many have complained about the wait times for a task to complete, which is why we've added a task management system. Within a couple seconds of clicking the "execute" button, the task is put into the background and the operator can do other things in the GUI while the tasks executes and completes in the background.  There's been a lot talk of the software being improved, but I haven't seen anyone mention the task management system yet, and it's a great feature to have in our software.
    Regards,
    Craig

  42. October 1, 2010 at 9:58 pm

    Steve, unfortunately we are stuck with the previous limit of 3 x level 1 ShadowImage copies. Shame!

  43. October 2, 2010 at 5:37 pm

    Thanks Nigel – I must say at present this is the best blog on the VSP I have read so far. Very informative. 

  44. JRW
    October 4, 2010 at 8:15 am

    Some people seem to take the view using a particular page/chunk size is some kind if silver bullet. There is a simple trade off; smaller chunks mean less wasted space per device, better profiling and less data to move to optimise placement; larger chunks reduce the metadata involved, but have higher waste (unutilised space in an allocated chunk), weaker profiling and more overhead. A 1GB unit has hardly any overhead (metadata), but is weak from an optimisation standpoint (very poor granularity), will on average see 500MB per device over allocation and is a big piece of data to move to improve performance. The sub MB schemes are scarily in the volume of metadata involved and simply will not scale well given current (and near future) processor/bandwidth capabilities. 42MB is 50 times more efficient (and therefore more scalable) in terms of its metadata, wastes a modest ~20MB per device, is granular enough to give some real value to tiering, but small enough that when data needs to be moved it doesn't suck up all the resources. Given the current state of processor capabilities and bandwidths involved I think it is a reasonable compromise.
    There is theoretical work that has been done in places like Cambridge on variable granularity to reduce the problem of metadata limiting scalability, but this will probably first see light of day in wide area file systems rather than the more tightly coupled block based arrays.
    I totally get why you place data on the top tier to start. The alternative is that any data that needs high performance will always have to be migrated from a lower performance tier that is showing it can’t cope. I haven’t seen the detail of how it works on the P9500/VSP in terms of watermarks etc., but it is the right starting point in principle.
    BTW I was told the 42MB size came from the search to find a size that worked well with all current and road map drives using any supported RAID level. Sounds like a very reasonable approach if true.
    Disclosure: I current work in the HP channel and in the past worked for both HDS and Dell.
    PS Thanks Nigel for another great post.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You can add images to your comment by clicking here.