My Head Hurts
No, I don't think its a tumor. I have lots of reasons.
First, I seem to find myself in the middle of a big debate between IBM's
bloggers extraordinare over this matter of IBM's estimate of how many x86 machines you can virtualize on a z10 mainframe. Here's the note I just posted in response to Tony's blog at IBM.
Thanks for the response, Tony. I am waiting for the dust to settle between you and Jeff over at Sun. I will echo here what I just wrote over on his blog.
1. I was a might sore when a qualified guy like Sun's Jeff Savit suggested I was all wet for quoting IBM's press release and conversations with IBM techs.
2. I appreciate it when a knowledgeable guy [Savit] gets in the middle and submits a view that does not attack me personally, but rather questions facts in an intelligent way. [Some folks at EMC might learn from this.]
3. I really appreciate it when [IBM] Tony Pearson jumps in and answers my blegs, as you have so thoughtfully done, in a timely way.
I was (and still am) inclined to believe that I am going to get more resiliency, performance and a better price if I am gung ho about virtualizing a bunch of x86 machines and use a mainframe LPAR rather than a tinkertoy hypervisor approach. That is my mainframe bias showing through.
From what I've learned from x86 engineers and from my own testing in my labs, VMware is a wonderful piece of technology from the standpoint of its respect for x86 extents. However, not only the hypervisor, but also the applications must respect the underlying extent code for everything to be shiny. Many apps don't, which seems to put a burden on VMware to catch all the crazy calls and prevent them from destabilizing the stack.
That is a technically non-trivial task and one that seems to account for the many abends we have had in our labs and the poor record of crash recovery failover, even when both VM servers are in the same subnet.
After reading your post and Jeff's, there is still some confusion about the number (and type) of machines we can virtualize, both on a VMware server and in a z10 LPAR. You and I agree that we are limited to 16-20 VMs in a virtual x86 server but Jeff says it is two to three times that many. Jeff's initial objection to IBM claims was that z didn't provide sufficient LPARs to host 1500 VMs.
Also, some of the services I was counting on to deliver resiliency (e.g. multiple processors with failover) were not, in Jeff's view, part of the configuration priced to come up with the "1500 VMs at $600-odd per" calculation proffered by Big Blue.
Thirdly, I argued in my Mainframe Exec piece that you were going to realize greater resource efficiency -- especially storage efficiency -- behind the z because of its superior management paradigm (SMS and HSM). Distributed computing just doesn't have these tools, or a common standard (de facto or de jure) for storage attachment and management that approximates mainframe DASD rules. As a result, the storage vendors duke it out at the expense of the consumer in terms of common management and ultimately efficiency.
Jeff said these tools had not been ported to z/OS, or that if they had (I need to go check my notes on this), they were not part of the suite of tools that would be available for use in a z/VM environment (which you must use in order to support LPARs).
These three issues seem pretty key to me. And frankly I remain a tad confused.
I really don't want this to turn into some sort of fun with numbers thing. I want facts.
I know you can crowd a lot more than 16-20 VMs into a VMware server, provided that you have enough cores and memory to support them. But, my read is that the VMs and their apps had better be very well behaved and not terribly resource intensive. (I invite someone to correct me if I am wrong, here, by the way.)
Think about it. If you have to ratchet up machine resources to host 20+ VMs, then won't you be paying a heck of a lot more for the VMware hosting server platform than you would for your standard, unvirtualized 2-way or 4-way job? That increases the hosting price.
Then, if I am not mistaken, don't you need to buy a VMware license (not the freebie version, but the one with more costly and sexier functionality) for each core? And, of course, wouldn't you need to license your hosted software for each VM instance?
Am I missing something here? Don't you need an $80K+ server to host 40 or 60 VMs (as Sun suggests)? If so, IBM seems to have an edge with a $600-x per server model, if its claims are true.
Here's another question: if the VMs you are building are a bunch of little file servers and low traffic web servers (which is what I am told most virtualized systems are today), then why virtualize at all? Why not just consolidate? Use a common file system for file servers and put each one in its own folder or namespace? For webservers, I host twenty or so low traffic sites pretty comfortably without VMware -- on a single machine with a Plesk manager sitting atop them all. So, why introduce a bunch of extra technology for chicken shite applications -- mainframe or VMware?
Another question: when you virtualize, do all those retired servers really get unplugged, or do we just find other stuff to do with them? Be honest. I am hearing a lot of IT guys now talking about "virtual server sprawl" -- the server sprawl that occurs after you virtualize. How do you take those little boxes off the line once and for all?
My head is really aching as I seek to wade into the issues so effectively raised by Sun and so effectively answered by IBM (until the two lock horns again). For what it's worth, Sun is arguing that the "90 percent resource utilization efficiency behind a mainframe" is a myth. They may be right. I have never been able to get to more than 75% by the calculations I was doing when using MVS a decade and a half ago, and that was with pure IBM DASD, HSM and SMS. So, I am inclined to agree with Sun on this point, regardless of how IBM does the math. That said, I have also never seen resource utilization efficiency of more than about 30 percent (and that's with lots of effort) behind Microsoft servers (though UNIX and Linux servers might have yielded a couple more points).
Does VMware really change this? I guess if you let VMFS take over your spindles and Vmotion do its thing, you might get resource efficiency levels moving a bit higher, but how much higher seems like another fun with numbers game in which your mileage will most certainly vary.
We are about to launch into a comparison test of just VMware and Virtual Iron in our labs, which will probably be followed by a bake-off around some of the other V approaches out there. I will report the results we receive here, just for giggles. Someone has got to try to break down the marketecture and publish some straight foo here.
My other headache: this past week has been grueling travel combined with very spotty communication via email. Someone decided to use my name to send out billions of emails for viagra or faux watches or whatever and the bounceback traffic from every server's SPAM filter has created utter havoc for my email servers. A lot of folks have contacted me to tell me that they are getting email saying that my mailbox doesn't exist (which seems to result from lengthy mail queues and timeouts). I think it has died down now, so if you want to send me mail, it should probably work.
Anyway, the lack of coherent email communications on the road has caused me lots of pain with clients and others. I welcome advice from anyone out there regarding ways to stop this from happening again.
Headache number 3? Air travel. In the six flights I took this week, two were delayed -- one so much so that I had to get an itinerary change because I would have missed my connection. The last flight, which was supposed to get me into Tampa by 11PM dumped me on the baggage carousel at closer to 2AM. I also spent an enjoyable couple of hours in the last row of a US Airways flight next to two guys with shoulders as wide as mine, forcing me to cheat to the right. I am still paying for it with a crick in my neck that is very painful and refuses to go away.
Happy Fathers Day to the deserving from DrunkenData.com. Fortunately, my six kids are not a source of my headache today.