My column for the next issue of Storage magazine in the Netherlands for the Dutch impaired.
ABOUT THAT DATA EXPLOSION
For the past couple of years, International Data Corporation has been publishing statistics and infographics about the “digital data explosion” that seem to pop up in just about every vendor presentation I see.
The back story on this research is that it is sponsored by a vendor of disk storage array technology. No surprise there.
The further back story is that the sponsoring vendor of the IDC research previously sponsored the “How Much Information” study at the University of California (originally conducted at UC’s Berkeley campus, but now emanating from UC-San Diego for reasons I do not know). The Berkeley researchers said there was a digital revolution, but that the vast preponderance of digital data creation did not fall under the domain of business. Mostly, it consisted of MP3s, DVDs, eBooks, etc. – analog media being converted to digital forms, mainly for non-business consumption.
The sponsor didn’t like Berkeley’s findings because they did not create the “fear, uncertainty and doubt” (FUD) rationale the vendor was seeking to encourage business IT folks to buy more enterprise storage capacity. IDC, however, was happy to oblige its customer (the array vendor, NOT its IT consumer clients) with the statistical rationale required to build such a FUD case.
IDC projected massive growth in storage spending to meet the coming data deluge, and even insisted that with a 300% increase in spending through 2011, most companies would experience a storage gap in which the amount of data being created and stored exceeded the available storage capacity. The whole thing had a sort of Dr. Strangelove feel to it (assuming you are old enough to remember the hilarious exchange about a “doomsday machine gap”).
IDC has been proven wrong in its prognostications. Storage spending has not accelerated, but instead has decelerated in this recessionary economy. While this has not stymied the rate at which data is growing in most companies (a guesstimate at best, given the fact that no one actually measures data growth rates in their firms – only how much more capacity they are adding year-over-year, which is not the same thing), there seems to be no desperate scramble to buy more elbow room for burgeoning bits.
Not only did the analyst fail to predict the impact of the recession on storage or data growth, it forgot to consider two dynamics that competent analysts really should think about. One is the Moore’s Law Corollary in Storage.
Since the mid-1980s, storage disk capacity has accelerated at a rate of about 100 percent every 18 months. The same size disk (2.5 or 3.5 inch) this year will likely hold double the data a year and a half from now.
In the past, this amazing trend reflected a string of engineering improvements in platter coating and read-write head designs and noise discrimination technologies that seemed to have no end. Even when the industry started to grumble about the supposed superparamagnetic barrier to further growth – the concern being that there was a fixed limit to how closely bits could be arranged in a cylinder on the disk before their polarity began causing “random bit flipping” – smarter ways were found to arrange the bits themselves. Only a few years ago, perpendicular magnetic recording (PMR) technology, in which bit magnetic poles are arranged perpendicularly to media rather than in parallel, was introduced that increased drive capacities almost overnight to 1 and 2 TB on a 3.5 inch platter.
In January, FujiFilm and IBM announced the application of PMR technology to digital tape, as well as a new BaFe recording media technology, that will shortly enable cigarette pack-sized tape cartridges like today’s LTO format tapes to store up to 30-odd TB of data. At about the same time, Toshiba announced more breakthroughs in patterned media and head technologies that should provide us a disk drive within 36 months that will store an amazing 4TB per square inch.
Both of these innovations call into question the veracity of IDC’s analytical model. Media capacity growth rates like these put the kibosh on the purported storage gap that the analyst claims is right around the proverbial corner. We will shortly have enough space for all of our data, including worthless stuff like most industry analyst reports.
The other thing IDC ignored was the human factor. Given that up to 70% of the data occupying the spinning rust in the world’s corporations is either archival grade, orphaned data, copies of copies, or contraband, a bit of data hygiene, storage resource management and archive is really all that is required to prevent Dr. Strangelove’s tale of FUD from being realized. In most companies I visit today, folks with hamstrung budgets are actually doing something about their storage junk drawers, starting with the elimination of dupes and dreck.
I was delighted to read in InformationWeek recently that idiotic technologies like array controller-based deduplication and thin provisioning are not catching fire as previously predicted. These do not address the real problem of data mismanagement at all. Clearly, without these functions, and without a budget to buy more capacity during these lean years, IT can be counted upon to find smarter ways to groom the current storage junk pile and to reduce the volume of junk data so they can stretch capacity a bit further.
Necessity is the mother of invention, and a recession is a terrible thing to waste.