Data ReductionI know, I know. As a tech writer and blogger, I should be immune to the occasional pushback from readers who hold views that differ from my own. I usually have a pretty thick skin, but I really hate it when I take the time to respond to commenters only to have the comment facility into which I type my response fail when saving my work. This just happened when I spent 35 minutes writing a detailed response to the many comments received in response to an article I had written in January on de-duplication and compression for TechTarget’s SearchStorage.

While one or two comments agreed with my perspective, several commenters disagreed vehemently or sought to add other perspectives that weren’t part of my coverage.  Aside from being accused of shoddy reportage by a commenter who referred to himself as TheStorageArchitect, most of the criticisms stayed on topic.  I spent quite a bit of time crafting a response and, had the response facility worked, here is a synopsis of what I wrote.

First, a bit of background.  The article was part of a series of tips on storage efficiency.  I had argued in the series that storage efficiency came down to managing data and managing infrastructure so that we can achieve the twin goals of capacity allocation efficiency (allocating space to data and apps in a balanced and deliberate way that prevents a dreaded disk full error from taking down an app) and capacity utilization efficiency (allocating the right kind of storage to the right kind of data based on data’s business context, access/modification frequency, platform cost, etc.).

In this context, I argued that some vendor marketing materials and messages misrepresent the value of de-duplication and compression — technology that contributes on a short term or tactical basis to capacity allocation efficiency — as a means to achieve capacity utilization efficiency.  I wasn’t seeking to join the tribalistic nonsense out there, claiming that XYZ vendor’s de-dupe kit was better than vendor ABC’s de-dupe kit.  My key points are as follows.

  1. De-duplication remains a proprietary, rather than a standards-based, technology.  That makes it a great value-add component that hardware vendors have used to jack up the price of otherwise commodity hardware.  I cited the example of an early Data Domain rig with an MSRP of $410K for a box of about $3K of SATA drives whose price tag was justified on the basis of a promised data reduction rate that was never delivered or realized by any user I have interviewed.  That, to my way of thinking, is one deficit of on-array de-duplicating storage appliances and VTLs.  It would be alleviated to some degree when de-dupe is sold as software that can be used on any gear, or better yet as an open standards-based function of a file system, mainly so that users avoid proprietary vendor hardware lock-in.  By the way, in response to one commenter, even if it is true that “all storage hardware companies are selling software,” I prefer as a rule to purchase the storage software functionality in an intelligent way that makes it extensible to all hardware platforms rather than limiting it to a specific kit.  That, to me, is what smart folks mean when they say “software-defined storage” today.
  2. De-duplication is not a long term solution to the problem of unmanaged data growth.  It is a technique for squeezing more junk into the junk drawer that, even with all of the “trash compacting” value, will still fill the junk drawer over time.  From this perspective, it is tactical, not strategic, technology.
  3. The use of proprietary de-dupe technology mounted on array controllers limited, in many cases, the effect of de-duplication only to data stored on trays of drives controlled by that controller.  Once the box of drives with the de-duplicating controller was filled, you needed to deploy another box of drives with another de-duplicating controller that needed to be managed separately.  I think of this as the “isolated island of de-dupe storage” problem. Many of my clients have complained about this issue.  Some commenters on the article correctly observed that some vendors, including NEC with its HydraStor platform, had scale-out capabilities in their hardware platform.  True enough, but unless I am mistaken, even vendors that enable the numbers of trays of drives to scale out under the auspices of their controller still require that all kit be purchased from them.  Isn’t that still a hardware lock in?  My good friend, TheStorageArchitect, said that I should have distinguished between active de-dupe versus at-rest de-dupe.  He has a point.  If I had done so, I might have suggested that if you were planning to use de-dupe for something like squashing many full backups with a lot of replicated content into a smaller amount of disk space, an after-write de-dupe process, which can be gotten for free with just about any backup software today, might be a way to go.  But I would also have caveated that, if running a VTL was intended to provide a platform for quick restore of individual files, using a de-duplicated backup data set might not be the right way to go since it would require the rehydration of the data on restore, introducing some potential delays in file restore.  The strategy of at-rest dedupe in the VTL context also has me wondering why you wouldn’t use an alternative like LTFS tape or even incremental backups.  As for in-line or active de-duplication, he stole my thunder with his correct assertion of the CPU demands of global de-duplication services.  But I digress…
  4. My key point was that real capacity utilization efficiency is achieved not by tactical measures like data reduction, but by data management activities such as active and deep archive and the like.  Archives probably shouldn’t use proprietary data containers that require proprietary data access technologies to be opened and accessed at some future time.  Such technologies just introduce another set of headaches for the archivist, requiring data to be un-ingested and re-ingested every time a vendor changes his data reduction technology.  This may change, of course, if de-dupe becomes an open standard integrated into all file systems.

I may have missed a few of the other points I made in my response to the comments on the TechTarget site, but I did want to clarify these points.  Plus, I will offer to those who said that my claims that the star was fading over de-dupe were bogus, I can only offer what I am seeing.  Many of my clients have abandoned de-duplication, either after failing to realize anything like the promised data reduction value touted by product vendors, or because of concerns about legal and regulatory permissibility of de-duplicated data.  While advocates are quick to dismiss the question of “material alteration” of data by de-dupe processing, no financial firm I have visited wants to be the test case.  That you haven’t seen more reportage on these issues is partly a function of hardware vendor gag-orders on consumers, prohibiting them under threat of voided warranties, from talking publicly about the performance they receive from the gear they buy.

If you like de-dupe, if it works for you and fits your needs, great!  Who am I to tell you what to do?  But if you are trying to get strategic about the problem of capacity demand growth, I would argue that data management provides a more strategic solution than simply putting your data into a trash compactor and placing the output into the same old junk drawer.


Post Post

I am informed that my rejected comment submittal has miraculously appeared in the appropriate section of the TechTarget site.  No one understands what happened or how it resolved itself, but there it is.

Second, the original version of this post named the fellow associated with the handle, TheStorageArchitect:  Chris Evans.  (No, not Captain America.  The other Chris Evans.)  But Chris, who I follow on Twitter, advised he that he did not make the post.  So, I have redacted his name from the original post here.  Apologies to Chris for the incorrect attribution.




 Tape's Renaissance is Real
At the IBM Edge 2014 conference in Las Vegas this past May, I had the chance to reconnect with the IBMers most responsible for the development and marketing of the Linear Tape File System (LTFS).

I had interviewed Ed Childers, who hails out of IBM Tucson, before. He has been my go-to guy on IBM tape technology for several years. Shawn Brume, Business Line Manager for Data Protection and Retention was a new face to me, but we had a great conversation going in no time.

Here is my interview with these folks, recorded in the always busy and very noisy Edge Social Media Lounge. We tweaked the audio the best we could, but it is still a bit sketchy in a few places. So, apologies in advance.

Frankly, I enjoyed this interview tremendously. Ed and Shawn were fresh and off-the-cuff, and Childers, in the final segment, gave what has got to be the best definition I have heard of software-defined storage, asserting that that is exactly what LTFS is. And I think, to the extent that SDS means anything at all, he is absolutely correct.

So, enjoy the following interview clips.

Great interview opportunities with consumers and innovators of next generation technology are one of the main things that see me hopping a jet every year to the Edge conference. Learning what is developing in the area of tape technology and archive are important to my clients and readers and I thank Ed and Shawn for giving this great synopsis.

BTW, I am obligated to note that I was at Edge courtesy of IBM, who provided travel and lodging expense reimbursement for delivering my five sessions at the TechEdge part of the conference. When I wasn’t attending or delivering seminars, I hung out in the Social Media Lounge, where these videos were recorded, and was compensated for tweeting and blogging about the show. My words were not censored or subjected to any advance review or approval. My opinions are my own. You might want to think about Edge 2015 today.


Edge 2014 logo


Storage for the Tragically Un-Hip

by Administrator on July 2, 2014


I have been hearing from readers recently who, while complimenting me on my practical articles on topics such as storage efficiency, common sense storage architecture, and so forth, point out that I am covering topics that the “hip and cool” generation of IT folk don’t really care about.

For example, I have just completed a new set of tips at TechTarget with provocative titles like…

(Note: some of these titles are mine, others were created by editors at TechTarget.)

And, of course, the first issue of (my quarterly e-zine) focused on tape and the next issue, currently under development, looks at data and infrastructure management — two more topics with limited curb appeal to those with their heads up…er…in the clouds.

I may be tragically unhip, but I will continue to look at these under-discussed topics, if only to ensure that someone does. I reviewed tweets this AM that had come in since midnight last night and was amazed at the volume of noise around software-defined stuff, cloud woo, and of course server virtualization — all of which emanates from the same sources. What kind of bothers me, in addition to the failure of just about any publication to subject these topics to rigorous investigation or to question their core assumptions or to interview the many folk who tell me that they are stopping the virtualization of their servers, withdrawing from their public cloud initiatives, and taking a wait and see attitude about the whole software-defined thing, is the way that folks I respect are helping to echo the noise — typically because it’s their job if they don’t!

I also need to feed my family, but I think that the long term loss of credibility that results from jumping on every IT bandwagon that comes along (consider how ridiculous Gartner “Magic Quadrants” have become, or IDC “exploding digital universe” papers, or…) is far worse than the short term benefits (financially speaking) that would accrue to evangelizing the BS.

Just had to say something. I am feeling discouraged by all of the mindless chatter. But, I don’t mind being un-hip — it’s part of being a grown-up.

(Sorry for the repost on this.  Server crash, restore.  You know the drill.)

{ 1 comment }

Stupid Pundit Tricks Spill Over into Tech

by Administrator on June 24, 2014

Small IRS logo
I have been watching with equal parts amusement and disdain as Congressman Issa’s interrogations have proceeded regarding the IRS’ selections of which 501(c)(4) organizations to scrutinize before granting tax exempt status.

The amusing part has been, of course, the dazzlingly inane politicization of the event.  It was discovered that a couple of IRS centers were looking into the validity of requests for tax exempt status placed by organizations that had “political sounding” names.  This seemed pretty much in line with the original intent of the code regarding 501(c)(4) organizations.  Under the language of the original code, an organization should be granted 501(c)(4) status only if it was dedicating 100% of its energy and money to activities of the charitable, religious, educational, scientific or literary variety, or testing for public safety, or fostering amateur sports competition, or preventing cruelty to children or animals.

It was clear from the exclusion from the above list of “political activity” that organizations engaging in political advocacy shouldn’t be treated as tax exempt social welfare groups – whether that political stuff emanates from a left- or a right-wing perspective.

Now, fair minded folks can argue over the interpretation made of the rules by IRS operatives, or the bureaucratic processes used to “screen” and select organizations for review, and I do not intend to debate the points here.  This is not a political blog.

What distressed me today was a technical nuance that came out during a free wheeling and somewhat pointless blathering session on the MSNBC Morning Joe program that happened to be on my TV as I was making my morning coffee.  One pundit in the panel was Alex Wagner, an energetic and sometimes sharp witted progressive pundit with whom I generally agree.  The point of discussion was the loss three years ago of a hard drive in a PC belonging to an IRS witness, who subsequently ditched her damaged PC altogether.  Apparently, some emails considered potentially relevant to Mr. Issa’s investigations were lost in the disk failure and remedial actions could not be taken because the PC itself was subsequently destroyed/recycled.  Very suspicious, ranted Mr. Scarborough, over and over again — Morning Joe’s opinion is one we should take very seriously given his brief stint as a Congressman some decades back.

Anyway, Alex Wagner distressed me, not for what she contributed regarding the subject at hand, but for some of ther tangential observations.  She said that the whole event showed incompetence because the practices of the IRS at the time that the disk drive was lost were “out of the 1960′s” — “They were still using tape backup, rather than clouds.”

Really?  So, the Issa investigation would be concluded if the email from the IRS was stored in a “cloud” where it would be invulnerable to loss or unauthorized access?  Alex needs a refresher in the many events involving clouds that have seen stored data become unavailable due to deliberate or accidental disruptions. And what about the many instances in which cloud data has been accessed illegally by anyone from the Chinese to the NSA to good old fashioned hackers — not to mention news gathering organizations? And what about those clouds whose business models have proven inadequate to the task of storing data and who, on the eve of bankruptcy, have advised customers to retrieve petabytes of data across slow network connections within a ten day period?

You know, none of the above happens with tape, Alex.  The durability of current media in proper environmental conditions is greater than 30 years.  The low cost and high capacity of tape are legendary, plus the technology provides an air gap between data and stuff that might compromise it, including nosy spy organizations, malware, and malcontents.

There is a reason why something like 80 percent of the world’s data is stored on tape, Alex.  And there is no fact-based reason to adopt an anti-tape position.

Truth is, tape backup is pretty freakin’ reliable, even in a political context.  The loss of millions of emails by the previous White House, which were presumably archived to tape, had nothing to do with tape, but with misuse of the technology.  The tapes were deleted after a period of time and reused because, at least in the official explanation, of confusion over the difference between backup and archive.  The former provided temporary protection of data assets, the latter its long term preservation.  A good jumping off point if you want to educate yourself about this stuff, Alex, is at the archives of George Washington University.  The tick-tock on that event is here.

Bottom line:  you political folk need to go argue about political stuff.  I get that.  But don’t drag storage media into your debates unless you know what you are talking about.  It diminishes your other arguments when you make such stupid assertions.


Sorting Out the File Junk Drawer Replay Ready

June 24, 2014

At the beginning of the Summer, we rolled ut of the Brown Bag Webinar Series, as those of you who joined us already know. We did three shows in three weeks, working to keep your commitment of time to around 45 minutes and our commitment to delivering useful information at 110%. We have paused […]

Read the full article →

Tarmin: It’s All About the Data

June 16, 2014

At the just concluded IBM Edge 2014 conference in Las Vegas, one of the highlights for me was having a chance to catch up with Linda Thomson, Marketing Director for Tarmin. I knew Linda from her time at QStar Technologies, where she ran worldwide marketing, and I had been hearing positive things about Tarmin’s GridBank technology for a […]

Read the full article →

VSAN in a Nutshell

June 13, 2014

If you agree that shared storage is better than isolated islands of storage, that some sort of virtualized SAN beats the socks off of server-side DAS, that hypervisor and storage hardware agnosticism beats hypervisor or hardware lock-in’s, that aggregated storage capacity AND aggregated storage services make for better resource allocation than service aggregation alone, and […]

Read the full article →

Working on Big Data Storage Research Project

June 13, 2014

To all big brains on Big Data:  this is a synopsis of what I plan to contribute to a forthcoming e-book on Big Data, a chapter on storage infrastructure options.  Am I missing anything important? Since Big Data first captured the imagination of organizations seeking to mine the “treasure trove” of data collected in multiple […]

Read the full article →

DMI Developing Basic Training for the Storage Impaired

June 13, 2014

A shift is occurring on the information technology landscape that has ramifications for how IT operates.  The advent of server virtualization using “hypervisor” software will see, according to leading analysts, the consolidation of between 69 and 75% of x86 workloads onto just 21% of deployed x86 servers by 2016.  The remaining physical servers, some 79%, […]

Read the full article → Webinars Available Now for On Demand Replay

June 12, 2014

It has been an interesting roll-out of the Brown Bag Webinar Series, as those of you who joined us already know.  We did three shows in three weeks, working to keep your commitment of time to around 45 minutes and our commitment to delivering useful information at 110%. We are pausing these webinars until […]

Read the full article →