FCoE Lesson #1

By | September 23, 2009

Following on from Hu Yoshidas apparent blunder over FCoE and lossy networks, I thought I’d do my bit to clear things up and shed some light.  Knowing a thing or two about FCoE I’m regularly amazed at how little some people know…….

So the following is a mini tutorial on the flow control and lossless Ethernet in FCoE networks.  If you don’t care about FCoE, then point your browser somewhere else and don’t come back 😉  On the other hand, if you are interested and want to know more, then put your feet up for 10 minutes and read on……..

Not your grandmothers Ethernet

FCoE requires an Enhanced 10GigE Ethernet network – it does not run over standard gigabit Ethernet.  Depending on who you speak to, this Enhanced Ethernet usually goes by one of two names –
 

1.    Converged Enhanced Ethernet (CEE).  Commonly used by IBMers and Brocadites.

2.    Data Centre Ethernet (DCE).  This is trademarked by Cisco and according to the Cisco website refers to the company’s architecture for next generation Ethernet in the Data Centre and is a superset of the DCB standards with some additions including L2MP.

As my wife often tell me that I live in my own little world, and to remain neutral, I'll refer to it as simply Enhanced Ethernet.

Whatever you decide to call it, it encompasses a collection of new technologies required in order for it to be able to transport FCoE traffic (frames).  Although FCoE is not the only driving force behind Enhanced Ethernet, it is certainly a major force.  Some of the technology changes encompassed in Enhanced Ethernet include the following –
 
•    Priority Based Flow Control (PFC) / Lossless behaviour
•    Low latency
•    Improved aggregate bandwidth
•    New link level negotiation protocols
•    Congestion Notification
•    Enhanced Transmission Selection

In this post I’ll concentrate on how lossless behaviour is implemented and achieved.  To do this its best to start somewhere near the beginning –


Its all goes back to SCSI

As the following very simple diagram shows, in FCoE networks, SCSI is encapsulated in FC frames, and FC frames are encapsulated in FCoE frames. 

So indirectly, SCSI is encapsulated in FCoE frames, meaning that FCoE and the underlying Ethernet network must keep SCSI happy.

So how do we keep SCSI happy?  If we cast our minds back, we should hopefully remember that SCSI was originally designed to be used within the confines of a physical server chassis, running uncontested over short parallel cables.  Uncontested = zero contention.  As a result SCSI was not designed to deal well with delays or transmission errors.  In fact, when either occurs, SCSI deals with them poorly.

In a nutshell – drop frames carrying SCSI payloads without efficient recovery capabilities (further up the stack) and you will be in a world of hurt! 

To keep SCSI happy you really need low latency and lossless behaviour.

What is a Lossless network?

Put very simply, a lossless network is a network that does not drop frames. 

The corollary being a “lossy” network – a network that drops frames when congestion occurs.  Your grandmothers GigE (1Gbps full duplex) network is lossy, it drops frames under congestion.

Making Enhanced Ethernet Lossless

The Data Centre Bridging Task Group has decided to implement link layer flow control in Enhanced Ethernet via a mechanism called Priority based Flow Control (802.3Qbb), or PFC for short.  PFC is the how losslessness is achieved on Enhanced Ethernet networks.  I might have just invented the word “losslessness” :-S

However, before we dig into PFC it is worth taking a side-step to briefly talk a little about Ethernet priorities.
 

Ethernet Priorities:  IEEE 802.1p defines 8 priorities for Ethernet.  These priorities allow for the implementation of Classes of Service at the link layer by tagging certain traffic types with an encoded priority.  Implementing Classes of Service allow available bandwidth to be divided in to logical lanes, or virtual links.  These virtual links can then be leveraged by other protocols and services, such as Priority based Flow Control which we will talk about.

The diagram below shows how a physical link between two switches can be divided in to 8 logical lanes/virtual links labelled CoS 0 through CoS 7 –

PFC is IEEE 802.1p Ethernet priorities “aware” and applies intelligence in the form of selective enforcement of the Ethernet PAUSE condition.  This is where the pause condition is selectively applied to particular Classes of Service.  This makes PFC perfect for converged unified fabrics where multiple traffic types and classes of service share a common network.  It also makes PFC superior to native FC BB_Credits which can only apply arbitrary conditions that affect all traffic on the link.

Interestingly, while PFC achieves the same, albeit superior, results as FC BB_Credits – that of creating a lossless network – the technical specifics of the two implementations are vastly different.  On the one hand FC BB_Credits require the sender to assume it cannot send frames until it explicitly knows that the receiver is ready.  On the other hand PFC allows senders to assume that they can always send unless explicitly told not to.

Hitting the Pause Button

PFC enforces the Pause condition per Class of Service by issuing a PFC frame with 8 time fields, one for each previously mentioned CoS/Ethernet priority.  When a switch issues PFC frames it is instructing the connected node to apply the PAUSE condition (i.e. stop sending) to frames with particular CoS values.  The diagram below shows the PAUSE condition applied to all Classes of Service except CoS 3 –

As well as the classes to which it is applied, the duration of the pause is also specified within the PFC frame.  This allows PFC to be selective in which class of traffic it will apply the Pause condition to, making it possible to enforce the Pause condition on a single, or a subset of all 8 classes.  It is also possible to selectively lift the pause condition, such as when congestion has dissipated and there is no need to wait until the Pause timeout expires.  This adds to the functionality of PFC.

PFC frames are handled at the MAC layer similar to how R_RDY’s are handled by FC-1 in native FC networks.  The PFC frame is a standard non-tagged MAC Control frame identified by Ethertype 0x8808 with Op-code 0x0101.  For best performance and efficiency all good FCoE switches handle flow control in hardware.

In Enhanced Ethernet networks FCoE will be assigned to a Class of Service which in turn will be treated a high priority so that when congestion starts to occur in the network (per switch) it will be allowed to continue to operate while other Classes of Service (protocols) might be paused.

Net result – lossless behaviour for FCoE frames on shared Enhanced Ethernet networks (unified fabrics).  All implemented at the link layer by the FCoE capable switches.  Voila!

Comments and thoughts welcome.

Nigel

Follow me on twitter @nigelpoulton.  I only talk about storage and virtualisation etc….

I'm a freelance consultant and can be contacted at nigel at storage-strategist dot com

23 thoughts on “FCoE Lesson #1

  1. mike

    Nice tutorial. Very accessible. So can you tackle TRILL next? LOLNo seriously, can you? 🙂

  2. Brad Hedlund

    Nice write up Nigel. I think it’s also important to point out that all of these “Enhanced Ethernet” mechanisms (PFC, ETS, etc.) are useless if not implemented on a “Lossless Ethernet” switch, a switch that can forward traffic from ingress to egress and guarantee no packet drops, ho head of line blocking, such as Cisco Nexus/UCS. Put another way, simply implementing the “Enhanced Ethernet” features alone on your switch does not create a lossless Ethernet network.

    Cheers,
    Brad

  3. John Dias

    Nigel, there’s about 100 pages of text book in that post – where can I learn more?

  4. Nigel Poulton

    Mike,

    If I get the time I will definitely put something up about TRILL.

    Brad,

    Good clarification.  I will also be updating the post to clarify that 10Gbps is not a requirement, 1Gbps DCE would work fine too, its more about being lossless…..

    John,

    Im considering putting together an FCoE Theory guide.  It seems a lot of people are interested, and since I know a little bit about it (I stress little) Im thinking of putting something together to share with everyone.

    Thanks to everyone on Twitter too for passing on the word about this post and the kind comments.

    Nigel

  5. David Vellante

    Very informative post. Question…so in THEORY, if the switch vendors/card vendors, etc supported it, Could the industry implement pause frames in iSCSI and allow me to avoid the complexity of an FCoE transition; and maybe even completely eliminate my FC infrastructure?In other words bring the QoS capabilities of FC to iSCSI, using pause frames, and allow me to get rid of my complex FC network?Thanks.

  6. Nigel Poulton

    Hi Dave,

    Thanks for stopping by.

    Im open to being wrong on this, I am far from an expert on iSCSI.  However, my gut feeling is No.  For the following reasons (all off the top of my head – I might think a little more about this tonight, sadly I have a day job that requires my very limited brain power right now) –

    First up, FCoE networks consist of two protocols –
    1.    FCoE – Ethertype 0x8906
    2.    FIP – Ethertype 0x8914

    As you can see from above, FCoE has its own unique Ethertype stating that the protocol encapsulated within the Ethernet frame is FCoE.  It is this Ethertype that is required to implement PAUSE.  iSCSI on the other hand carries native SCSI over IP networks (encapsulated within standard IPv4 packets).  Therefore you by giving iSCSI priority on the network you are actually giving all IP (IPv4 packets – Ethertype 0x0800) traffic priority.  Whereas with FCoE you pause other traffic and only allow Ethertype=FCoE to use the bandwidth………  Am I making sense?   Updated in comment further down.

    Then there are other things such as the fact that cabling for FCoE is rated with a similar bit error rate to FC and that FCoE switches switch FCoE frames in hardware using cut-through mode.  All in line with FC.  iSCSI on the other hand is sent over standard Cat5 type cables, may be switched in software and using slower store and forward mechanisms.  There is no doubt more as well.

    I’ll have a think about it in the car tonight and might write a post on it.  Its an interesting question you pose, but as I studied FCoE I remember constantly thinking that a lot of effort had gone in to making it fast and simple – like FC….  Targeted at a different segment than iSCSI.

    Nige

  7. Stuart Miniman

    Nigel,Great discussion of PFC, I second the call for also covering ETS, TRILL and Congestion Management if you’re up for it On the 1Gb/10Gb, while the Ethernet Enhancements can be on 1Gb, it’s worth noting that to date, no vendor has any products (or plans that I’m aware of) for 1Gb FCoE, so in reality, FCoE will start with 10GbE.  Not sure how much value the enhancements have at 1Gb since you are likely to run into bandwidth restraints that limit how much convergence is feasible.  iSCSI can run with 1Gb or 10Gb and take advantage of the new enhancements to Ethernet.  My answer to Dave would be that FC customers are much more likely to adopt FCoE than rip & replace w/ iSCSI; iSCSI customers can move up to 10GbE and won’t be interested in FCoE.

  8. Nigel Poulton

    Stuart, thanks for joining in. 

    Initially I had stated that 10GigE was requirement as I also know of no implementations or plans on 1Gbps.  However, to be technically accurate I crossed the statement out.  ianf picked me up on it 😉

    Agree with your last point re FCoE versus iSCSI.  FCoE was built form the ground up as a high speed Data Centre based protocol.  iSCSI was not.  Im not knocking iSCSI, it has its place, that place is just not competing aginst FCoE.

  9. Nigel Poulton

    Forgot to mention that I will definitely pick up ETS, TRILL and the likes.  Tonight Im hoping to throw something up on cables and infrastructure at the request of @ewantoo.  Ive got a long flight to Colrado Springs at the weekend so will put something together on ETS etc……

  10. Martin G

    If anyone wants to play with FCoE; can I suggesthttp://www.opensolaris.org/os/project/fcoe/announcementsandhttp://www.open-fcoe.org/You can start building your own FCoE arrays and having a play. 

  11. mike

    Nigel, the reason I was asking about TRILL is two fold. First to understand it better but secondly, is there anything other than multi-pathing that can’t be achieved via link aggregation and the DCB functions you have already described?Thanks again.

  12. Ahmad

    Nice and simple..
    One protocol to run DC!  Trials to run IP over ATM with LANE failed, IP over FC failed (with some exceptions for inband communications)
    Why do you think vendors did not continue the effort of running SCSI directly over lossless Ethernet without encapsulating it into another layer (FC) (something that has been tried in parallel to iSCSI?
    Is it because the richness services offered by FC? Is it that vendors and customers want to integrate with the legacy FC?
    Nice work!
     
     
     

  13. Nigel Poulton

    Dave,

    I need to cear up a mistake I made in my earlier reply to you.  Ethertype is NOT requried to implement PAUSE.  Brad Hedlund from Cisco got me doubting what I had said earlier (admittedly off the cuff), and after checking my notes I can confirm that iSCSI cn indeed benefit from Ethernet PAUSE and the lossless network that it creates…..

    PAUSE frames issued on the network list the Priorities that the PAUSE condition refers to (this can be 0 though 7).  Each Ethernet frame is allocated to a CoS (again 0-7) according to 802.1p.  This CoS is listed in VLAN field which contains a subfield referred to as the Priority Code Point (PCP).

    In a nutshell, iSCSI can benefit from features in Enhanced Ethernet.  Still dont think it competes in the same high end space as FC and FCoE though…..

    Some other things I thought of that make FCoE more performance centric –

    No length field so does cut through switching (low latency)
    FCoE frame are never fragmented/segmented making encapsulation very fast and efficient
    No interaction with TCP/IP

    Nigel

  14. Alex Sons

    Hi Nigel,I would be very interested in a comparison of FCoE and iSCSI on a DCN/FCoCEE. Likely the iSCSI protocol would need some tweaking in order to compete with FCoE? But the performance of iSCSI should be better (less overhead) and it would be much more simple as there is no SAN to be setup/managed anymore.Alex

  15. David

    Nigel,
    With TCP/IP, the TCP layer takes care of Flow Control and Retries. You explained how Flow Control is implemented in FCoE. Without a TCP layer, how does Enhanced Ethernet takes care of retries? Or is it done by a different layer in the stack?
    David.

  16. Nigel Poulton

    John Dias,

    I plan on making this a series of posts on various aspects of FCoE and the underpinning technologies.  So to answer your question….. hopefully this site will help you learn more.  Keep an eye out for more related posts in the future.

    Alex,

    I’ll put together something on FCoE versus iSCSI on DCE/CEE.  So many things things to discuss!

    David,

    It will become evident and be discussed in further posts, but for now the secret sauce is in the losslessness of the Enhanced Ethernet.  Congestion should not occur and therefore FCoE frames not dropped.  Although I know frame drops can occur for other reasons such as CRC errors from transmission errors.  However, with FCoE networks being short run within the DC over cables etc rated with excellent bit error rates these are not a factor.  ETS allows for bandwidth allocation etc.  But I plan on discussing them all in detail in future posts……

    Nigel

  17. Fridge

    Nigel,Thanks for the great post(s) on FCoE.  I am in the act of trying to determine where to go next with our virtualization infrastructure and as I look at various technologies & options this has been very helpful.  Looking into the crystal ball can be very foggy at times. 

  18. hugo

    Hi Nigel

    Read through your post earlier this week.  Found it very enlightening, thanks for taking the time to share it.

    Saw some requests for TRILL info in the comments, and happened across this today:

    http://etherealmind.com/trill-introduction-review-overview-why-what-how/

    Maybe that’ll satisfy some of the requests until you have a chance to get to your article on TRILL 🙂

    I find it fascinating that things I’ve been considering Layer 3 issues (multipathing, routing protocols) are moving down into Layer 2.  Perhaps that merely shows my ignorance 😉

    Regards

  19. Etherealmind

    A small addtion on TLA’s. DCE is Cisco’s trademarked term, CEE is IBM’s trademarked term and the IEEE term is Data Centre Briding or DCB. You should use this term when speaking generically.

    Secondly, you do NOT REQUIRE an DCB enabled switch to run FCoE, it will work on any Ethernet switch, but you won’t have any FC functions. The FC switch functions still need to be somewhere in your network. But for best results, you SHOULD run an FCoE enabled switch to get the supposedly new QoS mechanisms, and a non blocking switch fabric (as Brad talks about).

    Oh, and for the record, I regard FCoE as a transition mechanism for legacy FC storage to move to fully IP (aka iSCSI), over the next five years. As such, it is likely to be an interim and short lived technology and I won’t be spending much time on it. There is a lot of hype around it from the so-called Cloud Computing boosters, not a lot of adoption or use, and I still question whether it will last. Cisco has duped Brocade into playing in the FCoE market, thus boosting sales of routers and switches and avoiding having to be Number 2 in the Storage market.

  20. Nigel Poulton

    @Etherealmind

    Thanks for adding to the discussion.

    Im pretty sure I mention DCB in a couple of my posts. However, I dont use the term when refering to DCE/CEE for the same reason I dont refer to FCoE as FC-BB_E….. OK its not exactly the same thing we’re talking about, but I think the term Enhanced Ethernet works and helps the conversation flow better.

    BTW I was not aware that CEE was an IBM trademark – are you sure about that?  I see that Cisco are no longer using the term DCE due to the confusion it causes and are now using CEE.  Suppose I will start using CEE now.

    Interesting point re FCoE being a transition technology for moving so called legacy FC to iSCSI. Im not totally opposed to that train of thought, although I know EMC have announcednative FCoE connetivity for CLARiiON expected next year, so Im not so sure Id bet the house on it.

  21. Calypso

    I don’t understand why FCoE is such a hype now. It’s something Cisco is forcing, at least I see it that way.

    This protocol has got only 10-15% of useful payload, so basically what you get on 10GbE link is effectively only 1-1.5Gbps usefull data bandwidth. You need FCoE, FC and SCSI offload to get this data out. To me, this seems to be one really very hot microprocessor.

    On the other hand, if you have 8Gbps FC, you get around 40-50% usefull payload, which gets you 3-4Gbps bandwidth.

    FCoE is just some kind of mixture protocol that’s trying to be some sort of unified protocol, but it’s problem is that it consists of 3 different protocols (IP, FC, SCSI) just enlarging it’s frame size with unneccessery information. This is a step backwards.

    What we’d like to see is a unified protocol that has got all the good stuff from all 3 mentioned protocols. Cheap as IP, lossless as FC and whatever SCSI is good for. 🙂 And the most important part – to be compatibile with all mentioned protocols as well as efficient in terms of having high payload ratio.

  22. Nigel Poulton

    Calypso,
    Thanks for the comment.
    I will dedicate a post to responding to you.  Especially re the following comment –
    "FCoE is just some kind of mixture protocol that’s trying to be some sort of unified protocol, but it’s problem is that it consists of 3 different protocols (IP, FC, SCSI) just enlarging it’s frame size with unneccessery information. This is a step backwards."
    Watch this space.
    Also can you explain more around what you mean when you say FCoE only has around 10-15% usefula payload.  I tend to see FCoE as having a framing efficiency similar to FC.  I will cover thta too.
    Thanks for dropping by and joining the conversation and stay tuned for a detailed response.
    Nigel

  23. Swamy

    Very good description of PFC and lossless Ethernet is achieved in CEE. thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You can add images to your comment by clicking here.