https://slotsdad.com/ - casino online slots

Catfight over FCoE

by Administrator on April 29, 2007

Just when you thought things were wrapping up on the IP Storage Working Group reflector at IETF (where iSCSI was hatched), a cat fight broke out while I was on the road.  Subject:  Fibre Channel over Ethernet, being developed over at ANSI T-11.  Here is the thread, which begins with this note from a guy who I regard as a king in the storage protocol world, Julian Satran at IBM:

Dear All,

The trade press is lately full with comments about the latest and greatest reincarnation of Fiber Channel over ethernet. It made me try and summarize all the long and hot debates that preceded the advent of iSCSI.

Although FCoE proponents make it look like no debate preceded iSCSI that was not so – FCoE was considered even then and was dropped as a dumb idea.

Here is a summary (as afar as I can remember) of the main arguments. They are not bad arguments even in retrospect and technically FCoE doesn’t look better than it did then.

Feel free to use this material in any form. I expect this group to seriously expand my arguments and make them public – in personal or collective form.

And do not forget – it is a technical dispute – although we all must have some doubts about the way it is pursued.

Regards,

Julo

What a piece of nostalgia 🙂

Around 1997 when a team at IBM Research (Haifa and Almaden) started looking at connecting storage to servers using the “regular network” (the ubiquitous LAN) we considered many alternatives (another team even had a look at ATM – still a computer network candidate at the time). I won’t get you over all of our rationale (and we went over some of them again at the end of 1999 with a team from CISCO before we convened the first IETF BOF in 2000 at Adelaide that resulted in iSCSI and all the rest) but some of the reasons we choose to drop Fiber Channel over raw Ethernet where multiple:

Fiber Channel Protocol (SCSI over Fiber Channel Link) is “mildly” effective because:

  • it implements endpoints in a dedicated engine (Offload)
  • it has no transport layer (recovery is done at the application layer under the assumption that the error rate will be very low)
  • the network is limited in physical span and logical span (number of switches)
  • flow-control/congestion control is achieved with a mechanism adequate for a limited span network (credits).
  • The packet loss rate is almost nil and that allows FCP to avoid using a transport (end-to-end) layer
  • FCP switches are simple (addresses are local and the memory requirements cam be limited through the credit mechanism)

However

  • FCP endpoints are inherently costlier than simple NICs – the cost argument (initiators are more expensive)
  • The credit mechanisms is highly unstable for larger networks (check switch vendors planning docs for the network diameter limits) – the scaling argument
  • The assumption of low losses due to errors might radically change when moving from 1 to 10 Gb/s – the scaling argument
  • Ethernet has no credit mechanism and any mechanism with a similar effect increases the end point cost.
  • Building a transport layer in the protocol stack has always been the preferred choice of the networking community – the community argument
  • The “performance penalty” of a complete protocol stack has always been overstated (and overrated). Advances in protocol stack implementation and finer tuning of the congestion control mechanisms make conventional TCP/IP performing well even at 10 Gb/s and over.
  • Moreover the multicore processors that become dominant on the computing scene have enough compute cycles available to make any “offloading” possible as a mere code restructuring exercise (see the stack reports from Intel, IBM etc.)
  • Building on a complete stack makes available a wealth of operational and management mechanisms built over the years by the networking community (routing, provisioning, security, service location etc.) – the community argument
  • Higher level storage access over an IP network is widely available and having both block and file served over the same connection with the same support and management structure is compelling – the community argument
  • Highly efficient networks are easy to build over IP with optimal (shortest path) routing while Layer 2 networks use bridging and are limited by the logical tree structure that bridges must follow. The effort to combine routers and bridges (rbridges) is promising to change that but it will take some time to finalize(and we don’t know exactly how it will operate). Untill then the scale of Layer 2 network is going to seriously limited – the scaling argument

As a side argument – a performance comparison made in 1998 showed SCSI over TCP (a predecessor of the later iSCSI) to perform better than FCP at 1Gbs for block sizes typical for OLTP (4-8KB). That was what convinced us to take the path that lead to iSCSI – and we used plain vanilla x86 servers with plain-vanilla NICs and Linux (with similar measurements conducted on Windows).

The networking and storage community acknowledged those arguments and developed iSCSI and the companion protocols for service discovery, boot etc.

The community also acknowledged the need to support existing infrastructure and extend it in a reasonable fashion and developed 2 protocols iFCP (to support hosts with FCP drivers and IP connections to connect to storage by a simple conversion from FCP to TCP packets) FCPIP to extend the reach of FCP through IP (connects FCP islands through TCP links). Both have been implemented and their foundation is solid.

The current attempt of developing a “new-age” FCP over an Ethernet link is going against most of the arguments that have given us iSCSI etc.

It ignores the networking layering practice, build an application protocol directly above a link and thus limits scaling, mandates elements at the link layer and application layer that make applications more expensive and leaves aside the whole “ecosystem” that accompanies TCP/IP (and not Ethernet).

In some related effort (and at a point also when developing iSCSI) we considered also moving away from SCSI (like some “no standardized” but popular in some circles software did – e.g., NBP) but decided against.

SCSI is a mature and well understood access architecture for block storage and is implemented by many device vendors. Moving away from it would not have been justified at the time.

Nice recap of the thinking behind iSCSI, it prompted this response from Zack Best…

The real debate here is between two types of networks. The first is reliable at the link level and does not drop packets under congestion. The second is running a reliable transport protocol (i.e. TCP) over an unreliable link level network.

I agree with the scaling argument. For sufficiently large networks, reliable link level doesn’t work well because network component failure, or chronically congested links are not handled well. For sufficiently small networks, reliable link level has some significant advantages in simplicity, low hardware cost, performance, and worst case latency.

My personal view is that the vast majority of enterprise storage networks fall in the “sufficiently small” category. This view has to some extent been vindicated by the continuing success of Fibre Channel in this space and the inability of iSCSI to displace FC in any significant way for enterprise storage. Of course, this may or may not change in the future.

Whether FC is simpler than iSCSI depends largely on your definition of simplicity. If one defines simplicity/complexity as the number of gates or lines of code to reduce the protocol to hardware or firmware, then my experience is that iSCSI is 2X to 3X the complexity of FC. This has implications in cost and reliability.

Particularly problematic with iSCSI is the unpredictability of the performance. Performance is great with no packet drop. However even a small amount of congestion can cause a sudden large drop and performance. This can be difficult to predict as a network that is almost but not quite congested can run great, but a small incremental change of any sort can cause the performance to become suddenly unacceptable.

For FC, or other protocol using link level flow control, the reduction in performance is much more graceful and incremental when the level of congestion is small and intermittent.

A second major problem with iSCSI is the unbounded nature of worst case latency. When a storage network fails, it is desirable to detect the failure in a fraction of a second and transition to a backup network. TCP, when implemented to the standards, can take many seconds or minutes to determine that a network has failed and close the connection. RFC 2988, for instance, requires that the minimum retransmission be one second. This means a single dropped packet may add one second to the latency of outstanding commands. This is a huge amount of time on a 10G link. No doubt this could be mitigated by drastically reducing the timeouts within TCP, but the market seems to be surprisingly resistant to tampering with accepted standards here.

Overall, the FC and FCP protocol have a lot in common with the Intel i86 instruction set architecture. They are overly complex, and rather poorly designed by modern standards. But they are good enough, and there is a huge amount of value add that has been built on top of them, and therefore little incentive to change.

FCoE is an interesting idea because it preserves 90% of the existing value add of FC, unifies the physical link with Ethernet, and uses the reliable link method of packet delivery.

There are two significant possibilities for iSCSI to displace FC (or FCoE) in enterprise storage networks. First is if the networks start to scale to large enough size that FC can’t be made sufficiently reliable, and second if CPU compute cycles become sufficiently cheap that the iSCSI protocol can be run in host software with no negative performance impact.

Barring either of these, it seems that iSCSI will have an uphill battle, and FCoE may have a place.

Julian responds:

Excellent comments. My take (if not obvious from the previous text) is that data centers will be very large and compute power (as evidenced by the multicore) and advances in stack implementation are bound to improve substantialy the performance of the protocol stacks (see Intel and our work) and layer 3 switching.

It is important also to point out that Ethernet has substantial latencies if only bridging is using and replacement technologies (such as Rbridges or others) may take some time to appear.

Julo

NAB from linux-iscsi.org responds:

A quick comment in regards to the abundance large of computing resources available for initiator side software IP storage services.. Also Julo, many thanks for posting this great thread. 🙂

–nab

As the progress of the DDP TWG continues onward and 2nd generation hardware iWARP engines start to come online, the benefit of a hybrid software implementation with host OS software network stack modifications in kernel above TCP and SCTP starts to pose a question..

What real savings can a hyrbid iSER nodes using software DDP? What are those changes required to make high performance software DDP a reality..?

As osc-iwarp has found out, there is a significant CPU overhead assoicated with sockets and software VERBS, but I think this can be minimized with the right set of changes. Those changes are moving away from receieve side sockets for software iSER mode. These changes will start to become attractive for new product designs as this will allow RNIC hardware engines to scale futher using a more sane method or less painfully (depending on who you ask, OFA uses a Hybrid IB-VERBs) than traditional TOEs with speciality engines. Really taking advantage of what metadata in DDP and iWARP metadata is telling about the framed network transport can help in RDMA WRITE scenarios because the software RNIC would already have Stagged memory ready to go in the iSER case.

Especially when it comes to the API for the iSER stack, having a single codebase with vendors writing hardware drivers instead of re-inventing the wheel with sockets. I believe the smart software RNICs of the future will direct RDMA traffic directly into host OS SCSI memory buffers, and like today use something similar to sendpage() for TX.

As multi-core microprocessor designs with large, intelligent shared caches, and CPU cache coherentecy and I/O interconnects that in the 90’s where only available in the Alpha EV67 and highest of high end shared memory supercomputers and clusters are now starting to become the norm.

Pushing software iSER to the next level and beyond is surely not going to happen with a 30 year old API (sockets). Also for the data center story with a traditional tiered SAN architecture and software case, the hyrbid iWARP software stack on the initiator will not get a whole lot of interest until it can show improved performance and overhead that is acceptable to traditional iSCSI today. For the 3rd generation IP storage stacks, typical multiport 1G workloads is what will really drive interest into areas where putting a hardware RNIC will not be cost feasable for some time.

But just as with traditional iSCSI, we can also scale software iSER down towards towards platforms with more modest computing resources on low-power, wireless devices. Even in the type of mobile devices that IP storage services have been prototyped on today, the benefit of being able to scale server side hardware RNICs more efficently is not software iSER’s only benefit. On a side note, I think the transparency that connection recovery in traditional iSCSI and iSER allows to internexus multiplexing, as well as end user requirements for configuration and management scenarios. Using a active-active recovery mechinism that is as close to completely transparent as possbile (which ERL=2 is IMHO) is I think what mobile IP storage services users need to be demanding from their transports.

Thanks for listening!

Julian responds:

Great comments. You are all certainly aware that sockets are also undergoing transformation (asynchronous sockets) but even with synchronous sockets and some care not to break existing application that use synchronous sockets a restructuring of the stack may enable (as shown by the Intel and IBM-Haifa work) great increases in performance.

Software RDMA for the new class of of multicore engines is definetly an interesting proposition (on highly multithreaded engines it should come with not cost associated with it – or almost no cost).

I wish I knew more about the decrease in latencies in the switch fabric (it would be interesting if somebody could comment) as large Layer-2 fabrics have some inherent latency issues.

FCoE is asking us to forget all athis and go back and pay the hardware price for several more years and ignore the IP-land and nothing that I heard convinced me that we should do so.

Regards, Julo

Then, comes this post from Silvano Gai, who does not seem at all happy that this discussion is taking place at IPS rather than at ANSI T-11 “where it belongs” —

Julo,

Quoting: FCoE is asking us to forget all athis and go back and pay the hardware price for several more years and ignore the IP-land and nothing that I heard convinced me that we should do so.

FCoE is not asking you (the ips WG) anything.

FCoE is a proposed item for the FC-BB-5 WG of T11. If you have concern that T11 is making a mistake, I suggest you move this discussion to the T11.3 reflector.

The FC-BB WG will meet the first time to discuss FCoE in Bloomington, MN Wednesday June 6th, 2007.

IMHO, it is a bit premature to discuss the limitations of a technology that is not yet public or defined.

— Silvano

My take:

FC originated at IBM, as one of two practical responses to the big fat blue parallel SCSI cable everyone kept tripping over when they walked behind the rack.  Together with SSA, IBM developed Fibre Channel as a serialization of SCSI to enable the wire to become thinner.  The guys involved were interviewed for my last book on storage and told me, point blank, that they had never intended FC to become a network protocol or to be used to interconnect servers and storage in any sort of network.  That FC SANs are called “storage area networks” at all is an oxymoron.

Zack Best is right, but for the wrong reasons.  He noted that “The real debate here is between two types of networks.”  Very true.  Neither FC SANs nor iSCSI SANs are technically networks at all.  From what I can glean (and I realize I will be flamed for this), FC is a channel protocol, not a network protocol.  iSCSI is an application that happens to use a TCP/IP network, but is not in truth its own network.  It is simply another application running over TCP/IP.  Am I wrong?

As for NAB’s comments, I believe that major improvements in iSCSI performance will be made with the addition of iSER/iWARP acceleration — delivering performance that far outstrips that of FC.  How soon in the current economy?  Who knows.

Bite my shiny metal ass! As for Gai’s remarks, I find myself remembering that often repeated quote made by Bender the Robot from the soon to be renewed TV series, Futurama:  “Bite my shiny metal ass.”  

The T-11 committee at ANSI, that marvelously vendor manipulated standards body, is what gave us the crappy FC standards we have today — standards that can be implemented to the letter by different vendors in their switches with absolute certainty that competitor switches will not work together.  

When is a standard not a standard?  When it doesn’t enable interoperability between standards-based products, for one thing.  If FC standards are so much more mature and have such a broad ecosystem of cooperating vendors, tell me why are there still interoperability plug fests for FC at University of New Hampshire — a full decade after FC “standards” were first released by ANSI T-11? Shouldn’t we at least be beyond the point of plugging stuff together and crossing fingers that everyone’s blinky lights will come on.

The only argument I have heard for continuing any development in the FC protocols at all, whether that development is done at T-11 or at IETF, is to provide the means to wean all the crack addicts in the Global 2000 off of FC fabrics altogether, and as soon as humanly possible. FC SAN is simply the most expensive way to host data that was ever invented.  No surprise that it came to market at a time when everyone was suspending disbelief and investing in dotcoms.

Who needs FC anyway?  I mean, think about it.  Windows apps don’t need it.  Apple apps don’t need it. (Despite products like My First SAN and SAN in a Box, SMBs don’t need it.)  So does that make it an enterprise play?  I wonder.  Most Oracle and SAP apps don’t need it.  In fact, if you show me applications that really do need the capture feeds and speeds of FC, I will likely suggest a suitable and intelligent replacement:  the mainframe.  At least in a mainframe world, you have management — you aren’t exposed to the machinations of mavens of overpriced storage arrays with their deliberate and damnable efforts to obfuscate common management and to lock consumers into a terrible downward spiral of cost.

There, I said it.  I feel much better now.

{ 8 comments… read them below or add one }

Howard Goldstein April 30, 2007 at 3:48 am

Jon,

As I mentioned earlier in my “Note from Howard”, FC over Ethernet is not a protocol that tried to make the case that it is a viable replacement for IP based SANs. That is not the point at all. Those who you referenced in the post above indicating technical positions describing the “real” debate are also barking up the wrong tree. I see FCoE in the context of a replacement for the IPS standard protocols FCIP only. FCIP uses TCP IP versus UDP IP, or Ethernet directly. The intent of FCIP is to extend the connection between FC switches beyond a native distance by taking advantage of a gateway approach on either side moving via TCP/IP. Through this extension it preserves the link-level nature of the link between 2 FC switches. The concept of a single FC fabric is also preserved for those who desire this.

Customers are interested in continued asset utilization. Like it or not there is a heavy investment in FC networks and in those cases, the ability to extend a FC fabric over an IP infrastructure sometimes make sense and continues to make sense. FCoE simply allows that to happen over existing Ethernet infrastructure without the need for TCP. Even with iSCSI based solutions that use IP, the assumptions being made about the need for sophisticated transport in this implementation are overstated. Although interesting, FCoE is not the jaw-dropping protocol that this entry makes it out to be. To use it as another opportunity to bash FC as a technology is fine as this is your blog, but I feel that has been overdone.

What is interesting is the arguments made early in this post about the so-called limitations of Layer 2 networks. FC is actually a multi-OSI layer technology. FC has transport recovery built in with FC4 Enhanced Recovery even for those limited opportunities to address those minimal failures without upper layer SCSI involvement, if that was bad anyways. The reference to “adequate “flow control over limited span networks seems to imply that TCP Sliding Window Flow control is somehow any better. IT is also difficult to support these global sized arguments when no-one at this point needs that kind of connectivity as we look at SAN requirements. Very few installations need more than the power of 239 FC switches any more than they need similar larger connectivity in the private IP SAN space. Stating that FCP endpoints are costlier than simple NICs is a statement of the obvious. An SUV costs more than a non-SUV. They are both viable transportation and can both meet most of the needs.
The “might be” argument of 1 gig vs. 10 gig as a possible increase in error rate can be offset by the fact we are doing more in less time as well. The argument for TCP in iSCSI in today’s networks is weak itself and not necessarily the correct decision even if it was made by community. Perhaps it is time to leave behind the community ecosystem after all.
FC is not just a Layer 2 switch network; in fact it is a Layer 3 switch network. The FSPF protocol is almost exactly OSPF used in IP routers. This is a common misconception people have about FC as they compare FC to Ethernet in a switch.

The one that makes the most sense in this string is Silvano with one addition. Rather than moving the discussion to the FC-BB-5 working group, this is a discussion for the IPS Working Group after all. I see all of this as an opportunity to revisit the basic iSCSI and FCIP decision that selected TCP to begin with. More and more we are looking at successful SoIP implementations that provide high performance with reduced cost. I see that as the center of the discussion here about FC0E from an implication point of view.

Howie Goldstein

Charles April 30, 2007 at 4:11 am

Very interesting, although most of it went over my head with all this talk about protocol internals. I just like to comment on the fact that FC was never intended for the application where we see it today, ie SANs. Is it not true that the really great applications of technology and invention never were intended, they just happened?
The truly great scientific landmarks weren’t “Heureka, I found it!”, they were more of the “Odd..that shouldn’t happen…”

Who knows, FC might mature enough to be really useful. I don’t see iSCSI as the great enabler, to me it’s just a way to do block storage on the cheap. Why not use ip and ethernet where it really shines, just as I believe you’ve pointed out Jon, with Zetera.

Administrator April 30, 2007 at 12:23 pm

Howard,

I agree with almost everything you have said — most assuredly with the statement that IETF ought to revisit the viability of a UDP/IP interconnect. What I think we see here is a catfight between vendor communities (with their engineers participating) to push their preferred interconnect architectures. The real issue is what the business application requires, not what vendors want to sell.

If I were king of anything other than this blog, where in fact, I am the court jester, I would look to build feature sets into transport protocols for functions like RAID and virtualization. As our mutually preferred interconnect vendor has shown, these things can be done elegantly in UDP/IP and can save everyone a boat load of cost on storage infrastructure.

Thanks for writing.

Charles, you are right on the money. Don’t be confused by the acronyms and short hand of the engineers. This is politics pure and simple.

Keith May 4, 2007 at 9:27 am

As a minor point of history, IBM did not originate fibre links to eliminate SCSI cables. What they produced was ESCON which replaced the big blue pairs of channel cables which in turn had replaced the even bigger pairs of gray channel cables.

Fibre Channel was based on ESCON technology, which then lead to the turnaround of FICON which is ESCON protocol over Fibre Channel.

Administrator May 4, 2007 at 9:33 am

Thanks for your input Keith. It doesn’t jibe with what I was told by old IBMers I interviewed for my first book on storage back in the day. I know that ESCON replaced Bus and Tag (I was running data centers when that happened). However, this was one of two protocols developed to serialize SCSI — the other being SSA.

The evolution from ESCON to FICON is spot on.

I can only report what I have been told. Were you part of IBM’s FC dev effort before going over to EMC?

Howard Goldstein May 5, 2007 at 12:29 am

Jon and Keith,

My understanding is that Fibre Channel development comes with the requiement to come up with a common standard protocol to serialize and standardize not only SCSI but ESCON as well. By using different Classes of Service FC became and even better multiplixable transport for ESCOn Storage Protocol than even ESCON was in the same way as FC provides a better vrtual cable than Parallel SCSI Bus dis for the SCSI storage protocol.

It’s approach also allows the serialization and transport of many other protocols as well such as avionics, Virtual interface, and even IP over Fibre Channel through Link Encapsulation (rarely implemented).

Where some confusion comes in is that McData, now part of Brocade, was the company that OEMed the ESCON Director for IBM. Much of the hardware and firmware logic that was used by McData in the ESCON Director became a central part of their Fibre Channel Director as well. It is true that the Physical Layer functionality of FC and ESCON use similar approaches, Although one can legitamately say that Bus & TAG begat ESCON going from one proprietary IBM protocol to another, it is not acurate to say that ESCON lone begat FC.

It is acurate to say that a group of engineers at IBM were key to the development of FC as a standard that adds to this confusion.

Howie Goldstein

Marc Farley May 5, 2007 at 12:51 am

C’mon Jon, your memory is better than that. Fibre Channel was developed as a backbone network technology with a non-future as soon as gigabit ethernet came on the scene. At about the same time, IBM was developing SSA (serial storage architecture) as the new, proprietary high-throughput disk drive interconnect for enterprise storage. SSA was fast and scalable and threatened to have a severe impact on Seagate’s high margin enterprise SCSI disk drive business. (drive margins have always been much higher on the enterprise side of the industry). Al Shugart and Seagate went looking for a technology to compete with SSA and found FC, a technology with with a certain dead end as a backbone technology. They convinced the FC community to look at supporting a new connectivity model – the FC loop – which had an arbitration-based access method that was astoundingly similar to SCSI’s, which made it possible to port SCSI drive firmware in an incredibly short period of time to the new FC drives. This was truly brilliant technology tinkering by Seagate. Then the marketing machine kicked in and nearly all of IBM’s storage competitors jumped on the FC bandwagon and made a convincing story that FC would be open and cheaper than SSA. Some may recall the analyst and press debate over which technology would win – SSA or FC? Of course, FC won and a whole lot of other things followed, including the emergence of FC fabrics for enterprise storage subsystems.

So now, its getting pretty clear that the FC industry is not going to risk the development of 10 Gb FC silicon and instead is trying to figure out another way to keep the FC gravy train flowing – generating work for guys like Howie Goldstein and Zack Best (whoever he is – its obviously a fake ID). Its a pretty funny ploy – “the network opportunity dried up so let’s try to fool the market with protocol confusion”. The protocol arguments made miss the point completely – customers want fewer technologies to manage. That’s how they get the best leverage of their resources.

Administrator May 5, 2007 at 10:10 am

Guys,

I’m only telling you the story that was told to me by IBMers. I don’t personally care why FC was developed. IBM clearly preferred SSA and donated FCP to ANSI.

Previous post:

Next post: