Xmas Comes Early

Wow! We must be starting a trend.

In response to questions raised here, and probably by more than a few from consumers in the real world, regarding NetApp’s entry level box, StoreVault, and how it manages capacity, Drew Meyer at NetApp put together a dandy whitepaper. I have uploaded it to the DrunkenData site, since we helped provoke it, and make it available to anyone who is still struggling with why we want to pay that much money for only that much capacity and software.

NetApp StoreVault Paper

NOTE TO OTHER VENDORS: Don’t think that just because I posted this paper at my own cost for space and bandwidth that I will also hold your papers. This one is hosted only because we put Drew through so much emotional stress that he wrote it just for us.

7 Responses to “Xmas Comes Early”

  1. Drew Meyer Says:

    Cheers, Jon. This took longer than I had hoped but we wanted to ensure that the math was precise. I also want to thank my compatriot Steve Wilkins for his thoughtful writing.

    Block checksums are a particulalry nifty feature dedicated for disk protection, and I wish we could get everything for free… but the world doesn’t seem to work that way.

    We hope that this is useful to end users and resellers sizing their systems.

  2. Jeremy Says:

    On a whim I downloaded the whitepaper and after perusing the sales-type stuff at the beginning I reached the charts at the bottom whereupon I nearly spit my water onto my screen. They are claming nearly 70% overhead?!? 70% that’s seven-zero right? In the day and age where people are claming that RAID 5 has too much overhead NetApp has a product where you only get to actually USE 30% of what you pay for?!? I guess if you REEEALLLY want to be sure your data comes back out of your suddenly tiny storage array; but on a cost per GB basis I can’t help but to sit back and laugh.

  3. TomTreadway Says:

    First, Drew, excellent white paper – well written and easy to understand. And, Jon, thanks for sharing it on your website. I’m not sure either of you intended to encourage a comment-conversation via DrunkenData, so feel free to delete my post if it’s not appropriate. (Perhaps I could move it to my website if you don’t mind me referencing this site or even reposting the whitepaper.) With that said, I had a few questions…

    Block Checksum: The paper describes this as a method for detecting data corruption. I assume that’s a checksum on each 512 byte block, extending each disk sector to 520 or 528 or whatever number of bytes. (The paper wasn’t clear which, but it’s probably not important.) So have you considered using the T10 (SCSI/SAS) Protection Information Model (PIM)? It would seem to solve the same problem as Block Checksum. But are there any pros or cons to using PIM in the NetApp model? One potential pro is that this protection could be extended into the operating system or application, assuming they support PIM someday.

    Right-Sizing: The paper points out that drives typically vary by a small amount due to defect mapping during manufacturing. The “least common denominator” method mentioned is very common in the industry. But I was surprised to see that NetApp uses this extra space for defect management. Is this due to concerns that the drive doesn’t remap “well enough”? I agree that could be the case with SATA drives. Or is this space only used after the drive’s defect map is full? If so, I question whether this drive should continue to be used. Or perhaps the reason is something completely different.

    RAID-4 Advantage over RAID-5: The paper mentioned that adding a drive to a RAID-5 involved moving the entire array to scratch space, rebuilding the array, and moving the data back. Maybe I misunderstood, but that’s not how typical RAID controllers work today. The RAID-5 expansion is typically done online with no disruption to user access of their data. The paper then goes on to say that a drive can be added to a RAID-4 without rebuilding the array. Does that imply that the drive is concatenated, i.e., not striped into the existing array? That would seem to cause a big performance hit. Or does the WAFL implementation obfuscate how the blocks are stored and perhaps get around this issue?

    Again, thank you for posting this. It was a good read. Jon, let me know if this level of discussion is not what you intended.

    TT

  4. TomTreadway Says:

    Jeremy, to undertand the 70% loss you will need to read the sections that you skipped.

    First, tolerating two drive failures require EXACTLY two additional drives. This is the mathematical minimum.

    Second, a hot spare is a dedicated drive - by definition. The only way to avoid having this dedicated drive is to not use a hot spare. Personally, I wouldn’t use a hot spare if I had RAID-DP.

    Third, the amount of space reserved for snapshot is arbitrary. NetApp apparently chose 20%. This could be made smaller, but the number of snapshots will be reduced. Some customers may even want more than 20% reserved.

    Bottom line is that if you buy a 6-drive machine and turn on all these features then you’re only going to get 30%. I don’t see how it could be otherwise. It’s not a Marketing issue - it’s technically “just how it works”.

    TT

  5. Jon Toigo Says:

    Not at all, Tom. I agree with your tone and level of response. At least, with the whitepaper posted, I cannot be accused of letting loose on NetApp without the facts in hand.

    I am letting readers bird dog any marketecture in the architecture.

  6. Drew Meyer Says:

    And the conversation continues…

    First of all, Jeremy, the tables at the end show you clearly the percentage used. In the very worst case possible, a 6 drive system yields 30% after the assumptions are applied. In reality, we don’t see customers buying a system with 6 drives and dedicating three of them to parity because that would be silly. You are welcome to us a single parity drive and be no worse off than a RAID-5 array then turn down the snapshots and get close to 25% overhead. But that’s no better than the competition and decreases the value of ONTAP StoreVault Edition.

    The realistic numbers really start to kick in at the higher drive quantities, like 10-12 drive systems with RAID-DP and a hot spare. Assuming reasonable behavior, I recommend that end users and resellers apply a StoreVault rule of thumb of 40% overhead with the snapshot reserve assumption. For more precise answers, use the tables in the paper.

    For adding disks to our RAID 4 array, we simply lengthen the stripe. Read more about this architecture in the white paper on our site. There’s no performance impact to adding disks to a StoreVault and you can do it any time you want. I was not aware of on-the-fly RAID-5 expansion in similar storage servers, and I know that with Snap Servers expansion means grouping two RAID arrays together, costing another parity drive. If Adaptec controllers can extend RAID-5 - without a performance hit - than I am newly educated. RAID 4 gives scalabilty and the option of RAID-DP at any time, with no performance hit.

    In the StoreVault world, we always recommend a belt-and-suspenders approach of RAID-DP and a Hot Spare. Dual Parity RAID (like our RAID-DP and other vendors’ RAID 6) protects against a concurrent drive failure and the Hot Spare replaces a failed disk if there is no human present, ensuring maximum levels of protection. StoreVault’s block checksum feature lets us watch for disk sectors that are going bad, and perform a “Rapid RAID Rebuild” in the background. Essentially we can migrate data from a suspect disk to a hot spare invisibly and prevent a rebuild from ever occuring. Our position is that rebuilds are risky and expensive so we try to prevent them as much as possible. For the cost of one drive, why not?

    We offer 255 snapshots per volume, regardless of the space reservation. The amount of space used depends on the rate of change for the data (which is also impacted by the frequency of snapshotting). Users are free to dial the reservation up or down as they see fit.

    As far as right sizing, we’re pretty aggressive. ONTAP SVE comes from a world where disk capacity was cheap in the big picture, so we are re-learning to treasure every megabyte as we move down market. I expect to see us get more efficient with right-sizing capacities in the future.

    Hope that clarifies a few of our thoughts on the product design.

    Drew

    PS Tom, I don’t know the PIM - does it work with SATA?

  7. TomTreadway Says:

    Drew, thanks for the response.

    Regarding PIM, sadly it does not work with SATA. Good point. Hopefully that will change some day.

    TT

Leave a Reply

You must be logged in to post a comment.