Following on from Hu Yoshidas apparent blunder over FCoE and lossy networks, I thought I’d do my bit to clear things up and shed some light. Knowing a thing or two about FCoE I’m regularly amazed at how little some people know…….
So the following is a mini tutorial on the flow control and lossless Ethernet in FCoE networks. If you don’t care about FCoE, then point your browser somewhere else and don’t come back 😉 On the other hand, if you are interested and want to know more, then put your feet up for 10 minutes and read on……..
Not your grandmothers Ethernet
FCoE requires an Enhanced
10GigE Ethernet network – it does not run over standard gigabit Ethernet. Depending on who you speak to, this Enhanced Ethernet usually goes by one of two names –
1. Converged Enhanced Ethernet (CEE). Commonly used by IBMers and Brocadites.
2. Data Centre Ethernet (DCE). This is trademarked by Cisco and according to the Cisco website refers to the company’s architecture for next generation Ethernet in the Data Centre and is a superset of the DCB standards with some additions including L2MP.
As my wife often tell me that I live in my own little world, and to remain neutral, I'll refer to it as simply Enhanced Ethernet.
Whatever you decide to call it, it encompasses a collection of new technologies required in order for it to be able to transport FCoE traffic (frames). Although FCoE is not the only driving force behind Enhanced Ethernet, it is certainly a major force. Some of the technology changes encompassed in Enhanced Ethernet include the following –
• Priority Based Flow Control (PFC) / Lossless behaviour
• Low latency
• Improved aggregate bandwidth
• New link level negotiation protocols
• Congestion Notification
• Enhanced Transmission Selection
In this post I’ll concentrate on how lossless behaviour is implemented and achieved. To do this its best to start somewhere near the beginning –
Its all goes back to SCSI
As the following very simple diagram shows, in FCoE networks, SCSI is encapsulated in FC frames, and FC frames are encapsulated in FCoE frames.
So indirectly, SCSI is encapsulated in FCoE frames, meaning that FCoE and the underlying Ethernet network must keep SCSI happy.
So how do we keep SCSI happy? If we cast our minds back, we should hopefully remember that SCSI was originally designed to be used within the confines of a physical server chassis, running uncontested over short parallel cables. Uncontested = zero contention. As a result SCSI was not designed to deal well with delays or transmission errors. In fact, when either occurs, SCSI deals with them poorly.
In a nutshell – drop frames carrying SCSI payloads without efficient recovery capabilities (further up the stack) and you will be in a world of hurt!
To keep SCSI happy you really need low latency and lossless behaviour.
What is a Lossless network?
Put very simply, a lossless network is a network that does not drop frames.
The corollary being a “lossy” network – a network that drops frames when congestion occurs. Your grandmothers GigE (1Gbps full duplex) network is lossy, it drops frames under congestion.
Making Enhanced Ethernet Lossless
The Data Centre Bridging Task Group has decided to implement link layer flow control in Enhanced Ethernet via a mechanism called Priority based Flow Control (802.3Qbb), or PFC for short. PFC is the how losslessness is achieved on Enhanced Ethernet networks. I might have just invented the word “losslessness” :-S
However, before we dig into PFC it is worth taking a side-step to briefly talk a little about Ethernet priorities.
Ethernet Priorities: IEEE 802.1p defines 8 priorities for Ethernet. These priorities allow for the implementation of Classes of Service at the link layer by tagging certain traffic types with an encoded priority. Implementing Classes of Service allow available bandwidth to be divided in to logical lanes, or virtual links. These virtual links can then be leveraged by other protocols and services, such as Priority based Flow Control which we will talk about.
The diagram below shows how a physical link between two switches can be divided in to 8 logical lanes/virtual links labelled CoS 0 through CoS 7 –
PFC is IEEE 802.1p Ethernet priorities “aware” and applies intelligence in the form of selective enforcement of the Ethernet PAUSE condition. This is where the pause condition is selectively applied to particular Classes of Service. This makes PFC perfect for converged unified fabrics where multiple traffic types and classes of service share a common network. It also makes PFC superior to native FC BB_Credits which can only apply arbitrary conditions that affect all traffic on the link.
Interestingly, while PFC achieves the same, albeit superior, results as FC BB_Credits – that of creating a lossless network – the technical specifics of the two implementations are vastly different. On the one hand FC BB_Credits require the sender to assume it cannot send frames until it explicitly knows that the receiver is ready. On the other hand PFC allows senders to assume that they can always send unless explicitly told not to.
Hitting the Pause Button
PFC enforces the Pause condition per Class of Service by issuing a PFC frame with 8 time fields, one for each previously mentioned CoS/Ethernet priority. When a switch issues PFC frames it is instructing the connected node to apply the PAUSE condition (i.e. stop sending) to frames with particular CoS values. The diagram below shows the PAUSE condition applied to all Classes of Service except CoS 3 –
As well as the classes to which it is applied, the duration of the pause is also specified within the PFC frame. This allows PFC to be selective in which class of traffic it will apply the Pause condition to, making it possible to enforce the Pause condition on a single, or a subset of all 8 classes. It is also possible to selectively lift the pause condition, such as when congestion has dissipated and there is no need to wait until the Pause timeout expires. This adds to the functionality of PFC.
PFC frames are handled at the MAC layer similar to how R_RDY’s are handled by FC-1 in native FC networks. The PFC frame is a standard non-tagged MAC Control frame identified by Ethertype 0x8808 with Op-code 0x0101. For best performance and efficiency all good FCoE switches handle flow control in hardware.
In Enhanced Ethernet networks FCoE will be assigned to a Class of Service which in turn will be treated a high priority so that when congestion starts to occur in the network (per switch) it will be allowed to continue to operate while other Classes of Service (protocols) might be paused.
Net result – lossless behaviour for FCoE frames on shared Enhanced Ethernet networks (unified fabrics). All implemented at the link layer by the FCoE capable switches. Voila!
Comments and thoughts welcome.
Follow me on twitter @nigelpoulton. I only talk about storage and virtualisation etc….
I'm a freelance consultant and can be contacted at nigel at storage-strategist dot com