The MySpace storage monster

By | December 1, 2006

Being a curious storage person (curious about storage not a "curious person" Wink ) Im always keen to know about other peoples storage environments – which vendors do they have and how successful are their implementations……….. especially the less traditional but big players such as Google, Yahoo and the likes. So I was really interested to read this article about storage at MySpace which I picked up from the rupturedmonkey.com storage feeds. Im sure most would agree that MySpace has to be one of the more interesting setups out there, with some pretty huge and interesting demands.

Setting the scene for the article , Jim Benedetto VP of technology at MySpace, says that at MySpace they "… live or die by how fast they can serve their content….have had every storage vendor in their data center……todays storage systems simply cant handle thir needs….". Whoooaa! Talk about a wake up call for the big iron storage vendors out there. Im sure they’d all love to have their claws into a massive environment like MySpace. Fortunately for the rest, NetApp are the only ones named and shamed in the article.

Although I know that MySpace is an absolute monster when it comes to storage consumption and demands, I personally see this type of environment becoming more and more common. These new breed of companies serving up online digital content appear to have quite different needs to the more traditional customers of big iron vendors such as government establishments and banks. Unfortunately they don’t go into the specifics of why the current storage systems and vendors don’t meet their needs. Comparatively speaking, the big iron vendors seem to have solutions tailored "quite" well to these more traditional customers. I imagine companies like MySpace are unbelievably more dynamic than a bank – less change control etc (please hire me!!!!!). As an example the article says that the MySpace data centre in LA had a pile of flattened cardboard boxes the size of a truck that were this weeks delivered servers – may be a slight exaggeration but an impressive image nonetheless.

I was interested to read that in some cases they have 146GB and 73GB disks with as little as 5-10GB of data on them so that they don’t max out I/O queues and performance on each disk. Sound familiar to my recent post titled "When bigger isn’t better" – where we talked about big disks being fine in some environments but not when ultra-high performance is demanded. They had considered using smaller, 36GB, disks for their environment (Im not sure about 3PAR, their preferred vendor, but I know of a couple of big iron vendors who no longer list 36GB disks as supported in the top end arrays). However, they ended up going for the bigger disks and use the spare space for snap backups. They don’t go into detail on this but I would imagine their performance demanding active data is placed on the outer tracks of each disk with the snap stuff being on the lower performing inner tracks – sound familiar again to when I recently talked about Pillar re-championing this idea of short-stroking a disk and designing a policy based approach to this technique. We talk about all the good stuff here on rupturedmonkey Cool

From my experience the monolithic arrays from the big "3 letter vendors" tend to be quite inflexible compared to their mid-tier cousins. And may be in the past that wasn’t such a hindering factor – after all, in the past the people who needed big arrays tended to be big boring companies. But some of the more modern big storage users appear to be needing something a little more nimble but without sacrificing the reliability, scalability and performance inherent in the top end monolithic boxes. Will the big storage vendors continue to redevelop and refresh their existing boxes and approaches year on year ad infinitum all the while heading down the same road without realising that the world around them is changing (in my opinion)??

Im currently working away from home, so spend Monday to Friday in rented digs and am finding it increasingly annoying that the TV programmes I have recorded on my Sky+ box (PVR) are not accessible to me while Im away form home! Why doesn’t my cable/satellite provider offer me a service where I can record my TV while away from home and then access it on the road? Im sure it wont be long before they do! The same goes for my data on my laptop. I recently had to borrow my wife’s laptop for a week as my MacBook Pro wont take a GPRS/3G wireless network card. No probs – I borrow my wifes laptop only to find that half of the documents I need that week are back at home. Why don’t I have my docs stored up in the cloud somewhere accessible everywhere? The there’s my iPod – why do I need a newer model with more internal storage – why cant I just store ALL my digital stuff out there on the internet and just pull down what I want when I want it? I know these types of services are out there and maturing but imagine who big they will one day be – I now have a warm reassuring feeling that people with storage skills are still going to be needed in 10 years  Cool

Surely one day these new breed digital content companies will become the biggest storage users in the world. One instant benefit I would see from storing all my stuff on the internet would be that I would no longer have to store DVD’s, CD’s etc out of grabbing reach from my 10 month old daughter Lily. If all of my content is up there in the cloud then she can’t reach it and scratch it and I also get more floor and shelf space in my house – instant double win!!

Just to close this post with a teaser of sorts, they also mentioned that they expect MySpace to be totally off tape in the next few months – hmmmm interesting (assuming they aren’t just saying this to sound cool and cutting edge!?)

Nigel

PS. Anyone with any more storage info about these types of companies such as Google, Yahoo, Microsoft…… please feel free to post comments telling us who’s and what kit they are using! 

8 thoughts on “The MySpace storage monster

  1. snig

    You’re PVR problem might be solved by a thing called slingbox. Google it and let me know what you think…

  2. c2olen

    Well Nigel,

    I posted a similar question on Google's shop size a while back, on the RM forum. In the time passed since then, I read an article on the Google File System, which I currently am unable to find.
    The Wikipedia reference http://en.wikipedia.org/wiki/Google_File_System gives a good impression on the magnitude of Google's shop. Check bullit 4. in the reference section.

    Having their own optimized filesystem (not publicly available) must mean you have extreme requirements and needs. In our shop the performance requirements are nowhere near the requirements of the online service providers. Nevertheless we want good performance for our customers.

    As I've said in various previous posts and comments, we are in the middle of acquiring new disk storage systems. 4 big iron storage vendors are participating in the tender. We asked for nothing but 146GB/15Krpm spindles. Some seem pretty stubborn and keep insisting on offering 300GB/10Krpm spindles, because, according to them, these are capable of satisfying our requirements. Yeah right. That's either because their hardware doesn't hold enough drive slots to store the required amount of spindles to satisfy our storage needs, of they are really trying to give us the best price in order to win the deal.
    We've used a modeling tool to check the systems maximum IO's/second and throughput, and doing anything less then 15Krpm brings the system to it's knees. Those stubborn vendors are likely to not to make it to the next round.

    This said, I am really curious on how these big iron vendors go about their offerings when a MySpace kind of shop goes for a renewal 😉 They could never even dare to offer anything less then the absolute best performing setup. Being stubborn could really get you banned from the computer floor.

  3. snig

    You guys should take a look at Caringo. They have a file system that would be perfect for an online provider. I don’t know how much they’ve released so I won’t go into specifics on how their technology works, but it’s very simple and very quick.

  4. Harold

    What Myspace really needs to look at is BlueArc, the only storage vendor capable of handling their IOPS while allowing them to utilize more of their disc which will save them big money longterm.

  5. Nigel (mackem)

    BlueArc – now theres a company with an interesting product! Do everything you can in hardware but without the functionality restrictions of ASICs and, I believe, support for SSD as well as normal spinning disks!?

    Do you have much experience with them Harold?

  6. snig

    Thanks for the product advertising Harold. Would care to elaborate any further on exactly how BlueArc’s solution would fit into the MySpace solution?

    Do you guys have global namespace available? How would the performance of many arrays be balanced across all of them as the data grows to billions of files?

  7. Louis Gray

    Snig and Nigel,

    I can’t speak to Harold’s comments, as we don’t have anyone on our staff named Harold. But I can understand the pseudo-anonymous poster’s enthusiasm. Ruptured Monkey is a very good blog, as I communicated to you via e-mail a few weeks ago. It’s always good to learn about what users are seeing and caring about in the world of storage.

    To answer the specific questions that have risen in response to his thinly-veiled advertisement, BlueArc introduced a global namespace in February when we debuted the Titan 2000 series. In our tests at customer sites, our internal lab efforts, and on SPECsfs, we see no performance impact with multiple nodes, unlike other competitors.

    I am also familiar with customers who have extremely challenging environments with millions of files per directory and billions of files total. This has been seen in the Internet services market and others. BlueArc’s hardware platform enables massive parallelization, and separation of function, without centralized bottlenecks.

    I’ve been at the company going on six years now, work closely with our customers and would be happy to answer questions via e-mail or phone off-line, rather than distract this good thread about MySpace. Keep up the good work and we’ll keep reading.

  8. snig

    Sounds good Louis. Thanks for the response and I’m sure one of us will be in touch with an inquiry.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You can add images to your comment by clicking here.