https://slotsdad.com/ - casino online slots

COPAN Systems Reaches Out

by Administrator on June 27, 2008

Had an interesting conversation with Chris Santilli yesterday at COPAN Systems.  I was originally told it would be a briefing on the latest feature function set from COPAN, but I think it was also intended to provide them with an opportunity to respond to the questions I have been raising here and on various stages regarding the actual green-ness of Massive Arrays of Independent Disk (MAID), which is of course their bailiwick.

Their new stuff consists of an add-in VTL cache — a bunch of non-MAID disks that sit in front of their MAID array and behind their VTL head server.  Why do we need it?  Chris says that different streams of data come in at different speeds, so they want the cache to provide a buffer for receiving data that can be subsequently ingested and de-duped into the MAID array.  Makes sense since it is challenging to tune arrays for multiple concurrent writes coming in at different speeds.  They are supporting most of the speeds and feeds alternatives of FC, and are “qualifying” 10 GBE connections now.

On to the juicy stuff:

We chatted about what users were telling me, including the one who mentioned the dilemma created when doing block-level de-dupe and also doing defrag on primary storage disk, which changes block locations and may drive de-dupe engines crazy.  Chris acknowledged that the problem existed in an almost Monty Python-esque moment.  Remember the sketch “Stake Your Claim”?

Stake your Claim from “Monty Python’s Previous Record”

Game Show Host (John Cleese): Good evening and welcome to Stake Your Claim. First this evening we have Mr Norman Voles of Gravesend who claims he wrote all Shakespeare’s works. Mr Voles, I understand you claim that you wrote all those plays normally attributed to Shakespeare?

Voles (Michael Palin): That is correct. I wrote all his plays and my wife and I wrote his sonnets.

Host: Mr Voles, these plays are known to have been performed in the early 17th century. How old are you, Mr Voles?

Voles: 43.

Host: Well, how is it possible for you to have written plays performed over 300 years before you were born?

Voles: Ah well. This is where my claim falls to the ground.

Host: Ah!

Voles: There’s no possible way of answering that argument, I’m afraid. I was only hoping you would not make that particular point, but I can see you’re more than a match for me!

Host: Mr Voles, thank you very much for coming along.

Voles: My pleasure.

Host: Next we have Mr Bill Wymiss who claims to have built the Taj Mahal.

Wymiss (Eric Idle): No.

Host: I’m sorry?

Wymiss: No. No.

Host: I thought you cla…

Wymiss: Well I did but I can see I won’t last a minute with you.

Host: Next…  

Chris noted that de-duplication technology was past the hype stage (not sure about that one) but that the technology was still undergoing substantial development — rather like compression in its early days:  a lot of variations, no standards.  He further noted that some interesting work was being done by companies such as Ocarina on improved file type awareness that might help mitigate some nagging technical issues involving de-dupe of data on disks that had been defragged.  (Lot’s of “D’s” in that sentence.)

I asked him whether companies that were de-duping with current algorithms might eventually be required to un-ingest, then re-ingest, massive amounts of data using improved de-dupe technology?  He acknowledged that this would more than likely be the case. 

His candor was much appreciated, so it prompted me to ask him about another issue:  the compliancy of data that has been de-duped. Does de-dupe change data in a manner that does not fit with regulator concepts of “original and unaltered form”?  I reminded him that many of my clients are recusing certain data from de-dupe for just that reason.

Again to my surprise, he agreed.  “You are messing with the format of the data.  That opens a lot of questions.”

For my part, I do not want my clients to become test cases for the legal validity or regulatory compliance of de-duped documents.  Also, I find it strangely amusing that Data Domain has the audacity to pull an EMC and to declare their platform a Compliance Lock In in today’s news.  More on that in a later post.

Finally, we got around to discussing my contention that MAID isn’t green.  I referred Chris to a study at Penn State on the subject of MAID in primary storage environments (which seems to be what COPAN aspires to become) that showed how, in primary storage at least, databases and other apps don’t give drives time to spin down.  I should have also sent him to another research that suggested that drive idle power savings are eaten up by the additional power required to spin drives back to life.  Finally, I made the observation that you don’t green anything by plugging more hardware into the wall.

He made some good points in response.  He said, for example, that any platform that positions data based on access is greener than one that doesn’t.  I agree.  But can’t I do this myself, without MAID, using good archive software and much less expensive disk, tape and optical targets?

He said he had some stats on dive spindown and spin up wattage requirements and could show how the COPAN approach actually resulted in a net energy savings.  I welcomed him to send that my way and I will post it here.

In the meantime, thanks, Chris, for the chat.

{ 3 comments… read them below or add one }

gknieriemen June 27, 2008 at 12:47 pm

Jon:

You probably already saw this but I’d be interested in hearing your critique:

MAID 2.0: Energy Savings without Performance Compromises
http://www.storageio.com/Reports/StorageIO_WP_Jan02_2008.pdf

hirni June 28, 2008 at 4:14 am

I too scratched my head a bit, when I heared about COPAN’s claim of MAID-LUNs for fileservers … – I just cannot see the reason.
For a VTL-software (which controls access to its LUNs) a MAID makes perfect sense – esp. because a VTL is nothing else than a SEQUENTIAL-ACCESS emulation for disks – and it alread has to deal with tape-mount-times …

But for RANDOM-ACCESS – what normal filesystems actually do,, the MAID-feature just makes ZERO sense …
Actually I do not know any available filesystem, which has awareness of spun-down LUNs … for allocations etc.
(except you unmount it completely for longer times)
It’s an interesting area of research 🙂 (Politically correct words)

Overall – I don’t see MAID getting any use outside of VTL-like environments where backend-disk-access is tightly controlled by a MAID-aware software … (like for archives)

csantilli July 15, 2008 at 7:06 pm

Well….nothing like a little positioning of MAID for datacenter use cases…

I did review the “Interplay of Energy and Performance for Disk Arrays Running Transaction Processing Workloads” research paper.

As with all research papers, there is a point to make. In this paper it is clear that the research was to determine the power saving of disk arrays using tranactional workloads. The use of the TPC-C and TPC-H workloads denotes that the research was to determine the power savings on Tier1 and Tier2 storage.

In a datacenter there are multiple tiers of storage. To illustrate the use case for MAID, I will denote 5 Tiers of storage: Tier 1: Enterprise data creation, Tier 2: low cost storage for higher capacity, Tier 3: Reference and Archive data, Tier 4: Backup/Recovery data, Tier 5: Remote storage

The research paper points out that there is no “real” power savings when combining power modes (idle, standby) with use cases addressing Tier1 and Tier2 storage…I agree.

Remember, MAID – Massive Array of Idle Disks. The use of power-managed disk drives for Tier3 and Tier4 are an excellent model for MAID arrays. The access profile for Tier3 and Tier4 is sequential and/or batch writing with occasiional read access. The profile denotes that the data on Tier3 and Tier4 is aged with retrieval less than 50% (actually more like 10%).

To summarize, research on the use of Power Managed disk drives are targeted for persistent data (Write Once, Ocassionally Accessed, Rarely Modified). The research paper supports this positioning of MAID for Tier3 and Tier4…not the use of Power Managed disk drives in Tier1 and Tier2.

Previous post:

Next post: