Posts Tagged ‘Linear Tape File System’

Vblogs from the Edge: The Only Solution to the Zpocalypse – Tape

Monday, October 31st, 2016

zpocalypse2Just in time for Halloween, long time friend Ed Childers, who also happens to be IBM’s LTFS Lead Architect and Tape Development Manager, agreed to be interviewed at this year’s IBM Edge 2016.  Ed caught us up on all things tape, from the realization (finally) of the long predicted Renaissance in tape technology to the latest developments in tape-augmented flash and disk storage.  Have to admit it, Ed is my brother from another mother.

Childers has been doing a Rodney Dangerfield impersonation for the last could of years — tape just wasn’t getting the respect it deserved from the user community or the industry.  But with the “zettabyte apocalypse” around the corner, tape is suddenly very sexy.

Regular readers will recall that the zpocalypse to which we refer isn’t a Halloween novelty, it is real.  According to leading analysts, we are expecting between 10 and 60 zettabytes (1 zettabyte = 1000 exabytes) of new data to hit our combined storage infrastructures by 2020.  This has cloud farmers and large data center operators quite concerned.  Back of envelop math says that only about 500 exabytes of capacity per year can be manufactured by all flash chip makers collectively, while output from disk makers hovers somewhere around 780 exabytes per year.  Taken together, that totals less than 2 percent of the capacity required at the upper limit of projected data growth.

 

zpoc

The only way that we will possibly meet the demand for more storage is by using tape.  With 220 TB LTO Ultrium cartridges within striking distance, smart cloud and data center operators are already exploring and deploying tape technology again.  Ed is now officially the guy who women want to meet and men want to be.  Here are some of his observations.

 


Thanks, Ed Childers, for taking the time to chat with us.  And thank you to IBM for inviting us to attend Edge 2016 and for allowing us to use some of the availability of your best and brightest in these video blogs.

For the record, IBM covered the costs for our attendance at IBM Edge 2016 and they gave us a small stipend for live tweeting their general sessions.  The content of these video blogs and other opinions on this post are ours exclusively.

For those who are not familiar with our take on the zpocalypse, here is a refresher, staring Barry M. Ferrite…

 


And here is the follow-on video…

 

 

Thanks again to Ed and to IBM. Great show, that IBM Edge!

edge-animated-tile-5

Erik Eyberg Talks IBM Storage Strategy (Repost with corrections)

Sunday, June 14th, 2015

IBM_Edge_Upcoming_ImageThis is a repost of a previously posted interview with Erik Eyeberg, made as a correction to the previous blog that mislabeled Erik as Erik and Eyberg as Eyeberg.  He noted the discrepancy in an email on Friday and was so darned kind about it, my sense of guilt about my oversight was raised to an intolerable level.  We have corrected the incorrect reference both here and in the video clips below and regret the error.

The video interview was shot at Edge 2015.  This was my fourth IBM Edge event, if memory serves, and my third opportunity to get together with Mr. Eyberg, who came over to IBM with the acquisition of his former employer, Texas Memory Systems, and who has in a comparatively short period of time been promoted through the ranks. He has ditched the nerd glasses and hair cut he sported when I first met him for a more Benedict Cumberbatch look (well, that is what my daughters said when we were editing his video interview!), but he is still geek through and through.

And now, his business cards identify him as the Manager of World Wide Enterprise Storage Strategy and Business Development: a hefty title for a truly smart guy who has stepped almost effortlessly into his new and expanded role. It was a great pleasure to chat with him about IBM’s current technology and future directions. Here is part 1 of our interview, in which Eyberg sets the context, then ramps up a discussion of flash storage technology and the fit it is finding within business enterprises.

But wait, there’s more. We venture away from the evangelism of flash technology to discuss in Part 2 of the interview the lingering concerns that many folks (including me) have regarding the oversell of flash as a panacea and the problems created by memory wear and uneven performance. Erik’s point of view is interesting…

Eyberg makes a coherent case for IBM’s diversification in terms of storage offerings between “boxed” (conventional arrays) and “un-boxed” (software-defined storage) offerings. I liked his sensible discussion of unified management and REST near the end of the clip. It is good to know that IBM is still pursuing RESTful management tools for its kit. Look for RESTful management of the DS8000 series array with the release of version 7.5 of the array’s firmware.

On to the final part. Here is where Erik finishes his thoughts on RESTful management and what it will take for everything to be REST enabled for unified management (he seems dubious that this will happen). Then, he and I talk about tape technology, then about the future of storage from IBM’s perspective. Fascinating stuff.

Once again, special thanks to Erik Eyberg for agreeing to this interview (and to Lizbeth Ramirez Letechipia and company for helping me to round up Erik for this interview and for helping to get the clip approved by IBM).

For the record, this is one of several interviews I conducted at IBM Edge in exchange for room, board, transport, and free attendance at the event. I was also compensated for delivering five sessions at Tech Ed as part of the show.sessions at Tech Ed as part of the show.

Linear Tape File System: The Only Real Software-Defined Storage?

Tuesday, July 8th, 2014

 Tape's Renaissance is Real
At the IBM Edge 2014 conference in Las Vegas this past May, I had the chance to reconnect with the IBMers most responsible for the development and marketing of the Linear Tape File System (LTFS).

I had interviewed Ed Childers, who hails out of IBM Tucson, before. He has been my go-to guy on IBM tape technology for several years. Shawn Brume, Business Line Manager for Data Protection and Retention was a new face to me, but we had a great conversation going in no time.

Here is my interview with these folks, recorded in the always busy and very noisy Edge Social Media Lounge. We tweaked the audio the best we could, but it is still a bit sketchy in a few places. So, apologies in advance.

Frankly, I enjoyed this interview tremendously. Ed and Shawn were fresh and off-the-cuff, and Childers, in the final segment, gave what has got to be the best definition I have heard of software-defined storage, asserting that that is exactly what LTFS is. And I think, to the extent that SDS means anything at all, he is absolutely correct.

So, enjoy the following interview clips.

Great interview opportunities with consumers and innovators of next generation technology are one of the main things that see me hopping a jet every year to the Edge conference. Learning what is developing in the area of tape technology and archive are important to my clients and readers and I thank Ed and Shawn for giving this great synopsis.

BTW, I am obligated to note that I was at Edge courtesy of IBM, who provided travel and lodging expense reimbursement for delivering my five sessions at the TechEdge part of the conference. When I wasn’t attending or delivering seminars, I hung out in the Social Media Lounge, where these videos were recorded, and was compensated for tweeting and blogging about the show. My words were not censored or subjected to any advance review or approval. My opinions are my own. You might want to think about Edge 2015 today.

 

Edge 2014 logo

Data Protection at Edge 2014

Tuesday, April 22nd, 2014

Preamble:  The FTC requires that I disclose a financial relationship with IBM in connection with Edge 2014.  I will be blogging from the show, both in text and video, and actively tweeting my thoughts about what I see and hear.  Apparently, anything I write about the show is considered compensated work, even though IBM does not see it or have any say over its content. 

Disclosure complete. 

I will also be delivering, in exchange for a pass to the show, five sessions at TechEdge.  I probably could have gotten the pass just by playing journalist, but what the heck:  I like doing stand-up!

One of my sessions, as noted in a prior blog post, covers Software Defined Data Centers and the inherent need for Disaster Recovery Planning – this, despite the hype from the SDDC crowd about high availability trumping DR.  I beg to differ.  We can discuss my views on Wednesday 21 May from 4:30 to 5:30PM in Lido 3105 at the Venetian.

Another session will discuss how the Linear Tape File System (LTFS) is giving tape a renewed role in the data center.  That talk is scheduled for Monday, 19 May, from 4:15 til 5:15 in Murano 3305.  I have dedicated quite a bit of copy to this topic at IT-SENSE.org and included a multi-part video interview I shot with IBM’s LTFS wise guys at last year’s Edge.  I am looking forward to updating my interviews with these guys at the upcoming show.

The final three sessions are on the subject of data protection, discussing data vulnerabilities, technologies for making data copies, and considerations for selecting the right technologies to protect your data.  It updates a seminar series I delivered in a dozen cities last year to good reviews.  I think the series is important enough to merit a self-serving shout out.  Here are the dates:

  • Part I is on Tuesday 20 May from 10:30 to 11:30 4:30PM to 5:30PM in Murano 3305
  • Part II is scheduled for Wednesday 10:30 to 11:30 in the same room.
  • Part III is on Thursday, same Bat Time same Bat Channel.

Why three sessions on this topic?  Simple.  I am increasingly concerned about the failure of companies to adequately protect their irreplaceable data assets.  The woo from the virtualization/cloud/software-defined crowd holds that server virtualization, with its template cut-n-past (vMotion) capability, “builds in” high availability.  Therefore, disaster recovery planning is no longer needed.  Some people are buying it.

No.  Really.  They are.

At the server layer, this is an oversimplification, and in my experience, a bit of prevarication.  Automatic subnetwork-level failover doesn’t work consistently or predictably in my VMware environment.  (I know, I am just doing something wrong and need to buy $25K of VMware certification training or something.)  Moreover, not all of the data required for application reinstantiation is included in the virtual machine file; app data needs to be replicated on an on-going basis too.

But where should I replicate the app data?  To every DAS array connected to every server kit that might possibly host my virtual machine?  Really?  Perhaps that explains why IDC and Gartner project 300-650% year over year capacity demand jumps in highly virtualized server environments.

Moving outside the data center and into the cloud, we see the same problem writ larger.  Cloud Endure is giving great insights in their marketing woo right now:  to move workload from one cloud to another requires a lot of extra contextual data besides workload descriptions and app data.  Go watch one of their webinars.

Bottom line:  the “built-in continuity capability is better than a bolted-on one” argument that the hypervisor peddlers are using doesn’t match reality.

Data protection requires a “business-savvy” strategy that matches the right protection services to the right data based on the application that the data supports and the business process that the application serves.  “Business-savvy” means that the strategies meet carefully considered recovery priorities in a readily-tested, coherently-managed, and cost-sensitive manner.

Ranting on:  There is no one-size-fits-all strategy for data protection.  Active-Active Clustering with Failover is great when it works, but it is usually also the most expensive way ever devised to protect an application and its data and is not appropriate for all applications.  Rather, we need to use defense in depth, combining different services and associating them at a granular level with the data assets we need to protect.

Sounds complex because it is.  That’s why DR guys and gals make the big bucks.

Another key obstacle to the HA clustering of everything idea is the WAN.  Go more than 70 kms over a WAN and you start to sense the latency.  This should factor into your geo-clustering strategy and your cloud strategy if you decide to use clouds for data protection.  Deltas accrue to asynchronous replication, which you must use for replication over distances greater than about 40 miles.  That can screw with your recovery point/time objectives.  And if you plan to use remote data with local servers, you are looking at delays that may accrue sufficient latency to shut down apps or databases all together.  Just getting a lot of data back from a cloud, if you use it for backup, can be problematic.  Ask the big capacity consumers of the Nirvanix storage cloud when they were given ten days or less to retrieve petabytes of data from that failing storage clouds.  Even if they had access to OC192 pipes, the nominal transfer speed would have been a whopping 10 TB every 2.25 hours…and you never see optimal transfer rates over public WAN links.

Anyway, I wouldn’t use a cloud for data protection unless they provided tape as a mechanism to return my data to me.  Almost as efficient as IPoAC (IP over Avian Carrier).  Even if the cloud service is on the up and up, their ability to deliver on their service level agreement depends on the WAN, which is not typically under their control.

Anyway, those are just a few themes and memes we will explore in those three sessions on Data Protection that I will be doing at IBM Edge 2014.  Your alternatives to my data protection sessions at TechEdge are a broad range of other interesting talks delivered by authentically smart and entertaining guys like Tony Pearson.  Or, you abandon all care about risk and lose some money at the tables in the Venetian.  Your call.

Anyway, I hope to see you at Edge 2014.  Here’s where I point you to IBM’s page where you can register to attend.  (The compensated part of this post, I guess.  But, I still hope to see you there!)

Edge 2014 Venetian

Following Up on IBM Edge 2012

Thursday, June 21st, 2012

While no shortage of articles, columns and blogs have hit the airwaves, print media and Intertubes in the wake of IBM’s excellent Edge 2012 conference, underscoring the value proposition of IBM’s active data compression technology (manifested today on products like Storwize V7000 and SAN Volume Controller), my enthusiasm remains somewhat dampened.

Yes, I contributed to the noise around the compression announcements.  I moderated a webcast about the IBM compression technology story just this past Wednesday.  (The customer testimonial was compelling and probative and the IBMers on the call were excellent ombudsmen for their wares, as always.)  But, still, I want to be clear.

I see compression (and deduplication), at best, as tactical technologies. More precisely, I see them as weapons for fighting a delaying action in the struggle most companies face with respect to unmanaged data growth.

Compression (and dedupe to some extent) have the potential to slow the rate at which we fill our disk systems, but they ultimately do little-to-nothing to turn the tide, or to bend the cost curve in storage over the long term.

Ultimately, the combination of unmanaged data growth, combined with the store-everything-on-disk dogma (Omnia in Orbis) that has become so dominant in the marketecture around storage today, will lead to a predictable outcome:  storage acquisition expense will soon dwarf all other IT hardware spending in contemporary data centers.

And just as inevitably, increasing the complement of disk will ultimately run afoul of rising energy costs and will plow headlong into the coming energy crisis created by an archaic and over-saturated power distribution grid that is already creating issues in the New England Corridor, Northern and Southern California, the St. Louis grid nexus, and elsewhere.

 

To deal with the challenge effectively and permanently, our solution must be strategic and long terml.  If compression and dedupe deliver tactical advantage, it is by delaying the deployment of more disk capacity and by buying us time to do what must be done:  coming up with a way to manage the glut of data that is finding its way into the data junk drawer at an accelerating rate.

That’s a major reason why I believe that the most important technology demonstrated at the IBM Edge 2012 event was not compression, but rather an up-and-coming  mass storage technology leveraging magnetic tape and the Linear Tape File System (LTFS).
I am going to be writing about this on ESJ.com and probably in several other venues in the near future.  Here is some of my source material:  an interview I was generously granted at Edge with the top LTFS mavens at IBM.
The videos below are a bit noisy at first, shot as they were in the social media area of the Edge event.  But, I hope you will find it useful in understanding the history and future direction for LTFS-augmented tape storage from the perspective of IBM.  I will contribute my commentary in a later post, and hope to add interviews with others in the industry who are offering complementary technology to realize the Tape NAS vision, including Spectra Logic and Crossroads Systems.

 

 

Followup blogs to come…

If Common Sense Prevailed, What Would Storage 2012 Look Like?

Saturday, March 24th, 2012

This is a paper built on my webcast a couple of weeks ago for IBM…

Hot Storage Trends for 2012

Will We See the Year of the Infrastruggle or Will Common Sense Prevail?

Like death and taxes, data growth is inevitable.  Truth be told, few IT planners know at what rate their data is growing; they only know how much more storage capacity they are adding to their infrastructure year over year.

As in past years, storage infrastructure capacity will likely grow significantly in 2012. Analyst forecasts range from 40% to more than 100%.  Whichever estimate proves out, it is safe to say that capacity growth will be accompanied by a significant increase in the cost of storage infrastructure – from the standpoint of both acquisition expense (CAPEX) and operating expense (OPEX).  It remains to be seen whether this trend will escape the notice of senior managers, who continue to pursue cost-containment strategies in the face of a tentative global economic recovery.

Bottom line:  2012 could be the Year of the Infrastruggle: a contest to decide whether IT  will remain an internal function of the company or will be purchased as a set of services delivered by external providers.

Alternatively, if common sense prevails, 2012 could usher in a number of improvements in how the data burgeon is stored and managed that will make important gains toward the goals of cost-containment, compliance, continuity and carbon footprint reduction.

Hot Storage Trends for 2012

Will We See the Year of the Infrastruggle or Will Common Sense Prevail?

Introduction

Any projection of business technology futures needs to be anchored in a clear-headed assessment of current realities and trends.  Thus, it makes sense to survey the data from leading industry analysts regarding the situation that confronted business technology planners at the end of 2011 and the trends that are believed to be shaping decisions as we enter 2012.

Multiple industry analysts who track the worldwide deployment of external storage arrays have generated the statistics that are summarized on the following chart.  According to their data, companies entered 2012 with more than 20 Exabytes of external storage deployed in support of their business applications.

This growth trend is expected to continue (and some analysts argue accelerate significantly, as indicated in red), with the storage of file-based data growing at least 2X faster than block or transactional data storage, and also with an increasing percentage of storage capacity being deployed to host copies of data.  This is being widely attributed to a consumer preference for using disk-based data protection methods as a replacement for tape backup, though no data is offered regarding the volume of data currently stored on tape media.

We will likely see the perpetuation of another trend in storage as 2012 unfolds: the continued blurring of the lines between the traditional hierarchy of storage gear that has persisted for about three decades.  The traditional “hierarchy” of storage products is illustrated below.

For years, we have seen storage platforms that aligned, in terms of price, performance and capacity, with the frequency of data re-reference.  Data that was being read and written with the greatest frequency went to high speed/low capacity disk arrays that tended to be quite expensive.  As the data cooled, and its re-reference rate decreased, it was typically stored on a different type of disk array, one offering more modest speeds and feeds, greater capacity, but that cost a bit less than its high speed cousin.  A third tier of storage products – notably, those featuring tape and optical media, but also some disk subsystems – provided enormous capacity, significantly lower access speeds, and a significantly lower price than the other classes of storage products.  These platforms are used to store rarely-accessed data, such as archives and backups. Traditionally, a disciplined IT shop migrated data between these three tiers of repositories to achieve capacity allocation efficiency.

The new Millenium has seen some changes to this familiar storage meme.  Vendors have introduced products that blur the boundaries between primary, secondary and tertiary storage.

For example, over the past five years, we have seem many vendors introduce “tiered storage arrays” – products that blend high and low speed drives in the same cabinet and automated functions for moving older data to the higher capacity/lower speed drive trays in the kit.  We have also seen the introduction of de-duplicating disk appliances and even a few massive arrays of independent disk (MAID) arrays, featuring the spin down of drives storing data that is rarely re-referenced.  In both cases, the vendors posit these products as a replacement for tertiary storage based on tape and optical.

While the relative value, cost efficiency, and wisdom of these platforms could be debated, the bottom line is that these changes in the traditional storage landscape are having the net effect of increasing the costliness of storage platforms to businesses that use them.

Depending on the analyst one consults, storage now consumes between 33 and 70 percent of annual hardware budgets in businesses worldwide.  That percentage is also poised to grow in 2012.  And of course, that comes at a time when the majority of businesses are seeking to contain IT costs and to reduce overall budgets while increasing service levels from business automation.

This conundrum is leading to what DMI terms “Infrastruggle 2012” – a dramatic way to express the challenge that storage administrators and data managers confront as they seek to formulate storage infrastructure strategies for the new year and beyond.

Additional potential drivers or influencers of storage trends in 2012 will likely include the following:

  • Storage Hypervisor Functionality Added to Server Virtualization Stacks:  A leading server hypervisor vendor has announced plans to “enhance” its product with a new “storage hypervisor” microkernel.  This might make sense on a whiteboard, but the practical ramifications will be potentially very disruptive for existing storage infrastructure.  First, such a move will require users of the server hypervisor to segregate storage associated with their virtualized workloads from the rest of their storage infrastructure, which presumably supports both non-virtualized applications and host systems running other server hypervisor wares.  Given the efforts in many companies to aggregate their storage into fabrics over the past decade, this is potentially a significant and costly reversal that could have very disruptive effects.
  • Storage “Clouds”:  Storage “clouds” (service providers offering external on-demand storage capacity and other storage related services) are growing in number.  At present, these offerings, which are very similar to the Storage Service Providers of the late 1990s and to some service bureau outsourcing models in the 1980s, seem to appeal more to senior management than to IT managers.  But whether or not they represent marketecture than architecture today, the appeal of a low cost, high capacity “disk drive in the sky” may gain favor in 2012 – especially if the most extreme analyst estimates of data growth take hold.
  • Disk Drive Prices:  Disk drive prices have been inflated by purported supply disruptions attributed to 175 days of flooding in Thailand, which ended in January.  While drive vendors have reported shipments of drives in line with forecasts in 2011, this has not kept others in the supply chain from increasing disk prices in anticipation of supply shortfalls.  It will be interesting to see whether the impact of higher drive prices will be a short-lived slowdown in array purchases, or increase the competitiveness of Flash SSD as a storage component in 2012 storage architectures, or increase the appeal of storage clouds in the short term.
  • Increasing Use of Capacity Optimization Software:  Another trend likely to continue in 2012 is the on-going addition of value-add software (compression, de-duplication, etc.) to array controllers, in many cases dramatically increasing their acquisition cost.  The value case for this functionality is usually linked to capacity allocation efficiency – the technology will enable storing more bits in a fixed space.  However, the ultimate determinant of storage utilization efficiency is more closely tied to capacity utilization efficiency, which entails storing the RIGHT bits on a given storage medium.

This situation report summarizes the views of analysts and pundits regarding the road ahead for storage.  At its core is a cynical view of the likelihood that the challenges of growth in both data volume and storage cost of ownership will engender new approaches and strategies.  While such cynicism may be justified by past history, it provides little guidance to planners about alternative approaches to designing a storage practice that might deliver greater engineering, operational and financial cost-efficiency going forward.

Here are trends that we might see in more forward-looking business technology plans this year if common sense prevails:

  • Increased investments in storage infrastructure management technology
  • Increased investments in storage infrastructure virtualization
  • A rethinking of storage infrastructure design tied to energy consumption metrics – leading to the design of fast capture storage pools with an eye toward obtaining the most IOPS per watt of electricity, and the design of slower retention storage pools based on the metric of capacity per watt consumed.
  • Renewed interest in archive – which begins to move us toward the goal of data management and capacity utilization efficiency.

Common Sense Trend 1:  Better Storage Infrastructure Management

No one can argue that contemporary storage infrastructure requires better management.  According to leading industry analysts, the acquisition cost of storage hardware is only 20% of the total cost of ownership of storage.  The other 80% of storage TCO is closely related operational, environmental and administrative costs that relate directly to how well (or poorly) that storage infrastructure is managed.

Today, most storage continues to be managed on a “one-off” basis, using element management software provided by the vendor of each array deployed.  Element management becomes less and less efficient as storage infrastructure expands and includes “heterogeneous storage” (different products from different vendors).

The lack of a unified management paradigm for storage is owed to many factors, of course, with the failure of buyers to prioritize management when purchasing gear being key to the problem.  If consumers settled on a storage management software product and made it an absolute criterion for vendors seeking to sell them storage wares, storage infrastructure management would be baked into their infrastructure already.

Why is this important?  Absent a common management paradigm, storage infrastructure features higher labor costs, more downtime, and more inefficiency.

In a common sense world, 2012 would see a breakthrough in the area of a universal, standards-based, storage management.  Standards-based RESTful management from the Worldwide Web Consortium (W3C) holds much potential.  REST has been embraced by many vendors, including IBM in its 2009 Project Zero initiative, but little movement has been made to implement the technology.  One vendor that has already designed RESTful management on its gear, X-IO, has published its code at an open website (cortexdeveloper.com), making its templates accessible to the whole industry free of charge in keeping with the idea of open standards.  .

Whether REST emerges as a dominant method for enabling heterogeneous storage management in 2012 remains an open question.  It would certainly require an unprecedented level of cooperation between market competitors. At a minimum, there are fine vendor products for storage management – like Tivoli Storage Manager from IBM and others – that could be adopted as a corporate standard today and leveraged in storage vendor negotiations to help tame the OPEX costs of heterogeneous storage infrastructure.

Of course, storage resource management (SRM) software may help to make certain storage operations processes more efficient, but SRM alone is not the alpha/omega of storage management.

The strategic move today is to conceive of storage as a service that supports business requirements.  SRM is a component of such a service, but there are other “service management tasks” beyond the management of disk drives and I/O plumbing that also need to be considered:  things like capacity management, performance management and data protection management, just to name a three.

Per the illustration above, a service-oriented view of storage management would include both SRM, which deals with hardware and plumbing, but also a “meta-manager” that would enable the efficient delivery of storage services to applications – and to data itself.  Such a storage management approach could bring about a sea change in how we operate and manage storage today.

More to the point, it could be implemented in 2012 – with the help of storage virtualization technology.

Common Sense Trend 2:  The Adoption of Storage Virtualization

Most storage infrastructure today comprises a fabric and/or network of multiple storage kits from multiple vendors. As previously stated, most array deliver their own element management software, and increasingly, each delivers a subset of “value-add” software services hosted on their array controllers.  The result is a disparate and hard to manage infrastructure with “islands” of special functionality isolated to specific rigs.

Now, virtualizing that infrastructure – which basically means placing a software layer over the top of physical storage which enables the physical devices to be aggregated into resource pools – enables some interesting possibilities.  For one, it provides a place for service-based management to happen.

For example, if the storage planner likes technologies like thin provisioning, why limit that functionality to just one array, when it could be delivered as a service across all arrays in a virtualized storage pool?

Another example:  if some of data, supporting mission critical applications, requires some sort of continuous data protection, plus replication across and MPLS network in order to support high availability,  why can’t these services be associated with that data.  Alternatively, why not direct this data to a virtualized storage pool that delivers this functionality, while routing other data that doesn’t require such HA services to a different storage pool that delivers a different kind of data protection service?

Bottom line:  storage services could be delivered more efficiently in a virtualized storage environment.  Numerous storage virtualization products exist today that could be deployed with minimal disruption to existing infrastructure to enable a service-oriented storage provisioning model.  Ultimately, storage virtualization could help reduce cost, improve capacity management, and generally improve storage I/O performance across infrastructure by providing DRAM caching and link load balancing.

Common Sense Trend 3:  Reorganize Storage in a More Efficient Way

In fact, a virtualized storage infrastructure could set the foundations for a fundamental reorganization of storage assets themselves to facilitate a more suitable infrastructure in the face of burgeoning data growth.

In concept, reorganizing storage means dividing the infrastructure into at least two pools –  a capture storage pool – optimized with the performance required to handle the initial data workload from applications and end users – and a retention storage pool – optimized for storing less frequently accessed data.  The idea seems to be gaining quite a bit of mindshare among storage strategists who participate in DMI surveys and could be implemented, again, with minimal disruption, in many IT shops using existing infrastructure assets and storage virtualization. The driver of this new infrastructure design appears to be the cost and availability of energy.

Common Sense Trend 4 & 5: Build Storage with an Eye toward Energy Usage

For years, we have been building “capture storage” – that is, storage designed to be as fast as our most demanding transaction systems – using a very large complement of disk arranged in parallel (thereby spreading out workload over many spindles) and often using a technique called “short-stroking” ( limiting the number of tracks on each disk that are actually being used to reduce read/write head seek operations).  Such a configuration does buy better performance, but it uses considerable electricity in the process to power hundreds, or even thousands, of drives.

Last year, we saw the introduction of a new method for optimizing drive performance – augmenting the disk with a Flash SSD and using a sophisticated tiering methodology to spread operations across both devices.  In one implementation, data written to the disk that is subsequently receiving numerous concurrent read requests – becoming “hot,” to use the common parlance – is copied into the Flash SSD component, where the read requests can be handled at a much higher IOPS rate.  When read requests diminish, and the data “cools,” access requests are re-pointed back to data on the disk.

Such technology is superior to short-stroking hundreds of disk drives, and it consumes far less power.  As a consequence, implementing this technology economizes on hardware and power costs while delivering the required speed.  This is an example of how capture storage can be made more energy efficient, suggesting a new metric for measuring overall storage efficiency: IOPS per watt.

A different metric may be used to measure the efficiency of “retention storage.”  Remember that retention storage holds data that has much lower rates of re-reference than capture storage, so the metric that should guide design is not IOPS per Watt, but Capacity per Watt.

The desire to optimize Capacity per Watt was behind one of the most exciting developments in storage last year:  Tape NAS. As with IOPS per Watt-optimized storage, Tape NAS involves a cobbling together of two storage technologies: disk and tape.

Basically, Tape NAS uses a generic server platform (with some internal disk for file caching) to host a tape file system.  The Linear Tape File System (LTFS) is the latest file system for tape storage, and is enjoying widespread trial deployments as this paper goes to press. The server provides physical connectivity between the server and tape library, as well as access for users via a standard network file system protocol such as NFS or CIFS/SMB.  Users interface with the Tape NAS in the same manner as they would any file server, via a file listing.

After cobbling together the necessary hardware and software, the resulting “Tape NAS” delivers performance that is more than adequate for storing infrequently accessed data.  DMI’s testing with various configurations is recording data access speeds of between 30 seconds and 2 minutes depending on a number of factors.  Certain types of files, such as broadcast or surveillance video, genome datasets, and others are actually delivered more rapidly from Tape NAS than from rotating media.

Today, and in the future, tape storage delivers massive storage capacity (Petabytes per raised floor tile) with minimal energy consumption. The companies that are aggressively pursuing this strategy aren’t necessarily motivated by concerns about climate change or carbon footprint.  Their motives come down to two issues.  First, on average, utility power costs have increased by 23.2 percent in the past two years – an effective motivator to find ways to economize on power consumption.  Second, companies are reporting in certain areas of the country – the New England corridor, and Northern and Southern California today – additional power is becoming more difficult to obtain for their data centers.  The power grid is saturated.

2012 may well see energy prices increase, and the problems of power distribution worsen and expand to include the St. Louis, MO area, according to North American Electricity Reliability Council reports available on line.  These are good reasons to start thinking today about techniques for optimizing the energy demands of storage, which of all computing hardware deployed in the contemporary data center, is reported to be the biggest power user.

Tape NAS is one approach for providing low cost, high capacity retention storage that makes considerable sense from a Capacity per Watt perspective.  Similarly, Flash SSD-optimized disk arrays are showing promise as an energy-savvy replacement for traditional performance arrays from an IOPS per Watt perspective.

Common Sense Trend 6:  Archive

Contrary to most of the marketing pitches of storage vendors, just improving the capacity allocation efficiency and hardware management capabilities of storage gear will not provide a long term solution to the problem of storage cost acceleration.  Technologies like de-duplication and compression will at best defer the consequences of the data deluge.

To make a meaningful difference in storage cost containment, focus must be placed on the data that is being stored. DMI’s analysis of over 3000 storage environments over the past three years shows that — on average – companies are using nearly 70% of the capacity of every disk deployed with data that doesn’t need to be stored on disk at all.  The chart below tells the tale.

With a bit of data hygiene, storage administrators could reclaim the space occupied by junk data and orphan data, or that has been lost as dark storage.  This data and lost space currently equals about 30% of the space on every disk in service in most firms.

Another 40% of the data stored to disk is of archival quality and doesn’t need to be stored on magnetic disk at all.  This data could be migrated to Tape NAS.

Data hygiene and archiving together could be used to optimize current storage, in the process reclaiming upwards of 70% of storage infrastructure capacity.  That would “bend the storage cost curve” significantly.  So, why isn’t it being done?

Among companies surveyed by DMI, user files are viewed as the biggest contributor to capacity demands.  Files are one of several data types, but they are growing at double the rate of database output, and nearly five times as fast as workflow or email data types.  Plus, files are “anonymous” in terms of their content and business context. It is generally “beyond the pay grade” of most storage managers to figure out which files are important and which aren’t.  And, of course, nobody wants to delete anything.

Clearly, file archiving is needed, but in many shops, archive gets a bad rap. Partly, this is owed to an unfortunate association with Information Lifecycle Management (ILM) in many people’s minds.  ILM was oversold in past years as a data management panacea, but failed to deliver on its value case. Too many companies discovered to their chagrin that ILM wasn’t a product, but a complex process:  the holy grail of archive.

The truth is that ILM isn’t required to get going with an archive practice.  These three alternatives can be adapted to fit just about any environment today.

A simple method for culling out old data is to identify users who are consuming a lot of space with files they never access. Most storage resource management software packages offer good tools that can identify infrequently accessed files and their owners using file metadata.  Leveraging such capabilities and running a report every 90 days that identifies infrequently accessed files provides a way to inform business unit managers with the information they need so they can work together with their staff to sort out what needs to be on disk and what could be placed on tape.  That is a common sense strategy that is comparatively simple to implement.

A second strategy is to migrate older files into a Tape NAS platform as discussed above. Migrated files are still accessible through a NAS mount and a file system call.  While there may be  a slightly longer wait for file retrieval after a request, the inconvenience level should be minimal given the very infrequent access that is being made to these files today.  A Tape NAS solution can be built readily from an existing tape library equipped with partitioned media-aware drives, a generic server, a small disk array and a tape file system like LTFS (a free download from IBM).  Alternatively, the Tape NAS head can be purchased as an appliance from vendors like Crossroads Systems, or assembled using IBM SONAS and Tivoli Storage Manager.

The third, and best, approach is to build a data management practice, to begin classifying data at the point of creation, and to apply automated policies to move data into an archive over time.  That is, of course, as challenging a project and it is an effective solution.  Generally speaking, however, the granular the archive process the better it does the job.

Conclusion

The six trends enumerated above are what we might see – if storage planners suddenly develop common-sense in 2012.  It is within our grasp today to make infrastructure more manageable, to virtualize storage infrastructure so it can be delivered to applications and to data as a service, to rationalize storage into logical pools based on savvy metrics including those related to power consumption, and to begin sorting out the storage junk drawer and implementing an active archive.  Doing these things could very well see a bright new horizon for storage efficiency by year’s end.

Copyright © 2012 by the Data Management Institute LLC.  All Rights Reserved.

I hope this is useful to readers.  I have provided a PDF for publication on IBM’s Storage Community and I am providing a link for Hot Storage Trends 2012 here for download.