I rarely blog about blogs, but StorageMojo has an interesting post I think everyone should read. Basically, SM threw down the gauntlet to storage companies about failure rates in disk drives and related issues — stuff in fact that we have already covered here in some detail. He wanted to get confirmation that all drives are based on roughly the same hardware (no duh) and that failure rates are understated by the industry (no duh) and similar issues well known by everyone in storage, but perhaps missed by members of the great unwashed.
Interestingly, disk manufacturers didn’t take the bait. But NetApp did — and NetApp’s Director, Technical Strategy, Val Bercovici’s comments are what makes the piece so interesting. Talk about opening yourself up for satirical dismemberment…
I look forward to your views and will hold off on articulating mine until some input is received.
I will give you a hint about my take: I laughed til I cried.

{ 11 comments… read them below or add one }
It’s interesting that this subject should come up shortly after the Google study results were released, take a look at
http://labs.google.com/papers/disk_failures.pdf
It makes for interesting reading!!
Nik,
Thank you for the reference… very useful.
LB
That is the paper that Storage Mojo was referring to. Nice to have an in the trenches view of reality.
Drive failures? Who cares! Drives will fail – get over it! What you need is good error recovery technology (insert you fav vendor name here); a good backup strategy and most importantly (what we call) a Technology Refresh Program (includes more than just disk), that supported at the highest levels of the org – Disk is refreshed every 4 yrs (max), PC’s, 3yrs, etc.
Also squeeeze your Vendors; play them off against each other – we did and our current costs for FC drives are just slightly higher than the street price for SATA.
Some say forget RAID triplicate everything with cheap SATA drives – I say go right ahead, but for us that’s not cost-effective.
I started reading the NetApp response, but lost interest at the point where he suggested NetApp was a bit like Al Gore – I assumed he meant that the truth would be stretched in the rest of his response…
Hi Jon — loved the commentary.
I wrote a bit about this as well, if you’re interested:
http://chucksblog.typepad.com/chucks_blog/2007/03/much_ado_about_.html
Keep up the good work on the blog!
Jon, thanks for the extremely entertaining thread. Little could I imagine the off-topic “satirical opportunities” you hinted at until today.
At first I was expecting lame political references to Al Gore and Ronald Reagan, but thanks to Chuck we’ve now got EMC referring to “lubricants” in the same breath as 9/11 when discussing Google’s disk failure study.
Priceless…..
Interesting post, Chuck. I am a satirist, not a conspiracy theorist. While I might take exception to product architectures or business value, I do not share the view that there is a vast conspiracy aimed at producing substandard disk.
Heck, I remember when the storage-related stock market took a dive in 2001, shattering the image of NetApp, EMC and others as “invulnerable” to the forces that tore the dotcoms a new orifice. Someone pointed out to me at the time that the “phone home” feature on their EMC gear started phoning home a lot more often than usual and folks were showing up at the door with replacement parts. (Shades of HAL 9000 and that damed AE-35 unit.) Someone told me I should write something about EMC trying to compensate for lower sales by triggering phony maintenance calls. It was bullshit (I’m pretty sure), but even raising the spectre of the doubt might have cast aspersions on the value prop of proactive service and support.
Bottom line, I don’t believe there are any conspiracies around disk. Disk failures happen. The more disks you have the higher the percentage of failures. Deploy a 10,000 disk fabric and you will replace drives daily or weekly according to statistics.
I wish I could blame you guys, but in this case, I can’t in good conscience.
Too bad.
Brent, I am still biting my tongue here. I was going to post a funny today, but a comment from some jackass at NetApp made me a lot darker than usual. That — and a kidney stone. (OUCH!)
Why do you think they call it spinning media : ) ?
T.H.O.H. = “The Height of Hypocrisy”
In a thinly-veiled swipe at his nearest competitor, Mr. Hollis begins with the following unfortunate statement:
Some vendors will capitalize on just about anything.
Thinking back to the post 9/11 era, I remember all
the tacky marketing campaigns from data protection
vendors with the unspoken message “it could happen
again!”
I find that sort of marketing very repugnant.
Mr. Hollis forgot there are no “do overs” on the web. I submit the following archive of an entire EMC web page devoted to the “repugnant” marketing of the 9/11 tragedy.
Mr. Hollis already raised my ire by choosing to dishonor the victims of 9/11 with his completely off-topic and unnecessary reference, but by hypocritically editorializing on this sensitive subject in light of the above, Mr. Hollis has proven he is either senile, out of touch with his own department or simply a liar.
Either way given that Mr. Hollis’ blog never even attempted to address the underlying scientific findings of Google and others, is this really the best EMC can do for a public spokesperson?
As someone who works with members of EMC Customer Support I can tell you that if the bogus phone home fantasy were true you’d have seen it in the news. The headline would read something like “Entire sales organisation slaughtered by irate support staff”. You’d have been able to see the funeral pyres burning from Florida.
Did the person spinning this yarn also tell you that EMC faked the moon landings too? If EMC has to have a conspiracy theory I’d like one I could share with non-technical folks.
As for the disk drive flap it’s interesting that there wasn’t any failure analysis data of the “failed” drives provided. If oversensitive or crappy firmware marks a drive as bad yet the manufacturer finds it to be defect free has the drive actually failed?
In both reports the criteria for failure was that the drive was pulled but there’s little to no data on if some other element was to blame. I mention this realising that it’s economically prohibitive to pay for every drive to undergo analysis and it’s cheaper just to bin them as they are commodity components, but I’d have loved to have seen what percentage of drive “failures” in the studies had nothing to do with the drives themselves.
You must log in to post a comment.