<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-mmm-bfd-on-lags-03" ipr="trust200902">
  <front>
    <title abbrev="BFD for LAG Interfaces">Bidirectional Forwarding Detection
    (BFD) on Link Aggregation Group (LAG) Interfaces</title>
   

    <author fullname="Manav Bhatia" initials="M." surname="Bhatia"
	role="editor">
      <organization>Alcatel-Lucent</organization>
      <address>
        <postal>
          <street></street>
          <city>Bangalore</city>
          <code>560045</code>
          <country>India</country>
        </postal>
        <email>manav.bhatia@alcatel-lucent.com</email>
      </address>
    </author>

    <author fullname="Mach(Guoyi) Chen" initials="M." surname="Chen"
	role="editor">
      <organization>Huawei Technologies</organization>
      <address>
        <postal>
          <street>Q14 Huawei Campus, No. 156 Beiqing Road, Hai-dian
          District</street>
          <city>Beijing</city>
          <code>100095</code>
          <country>China</country>
        </postal>
        <email>mach@huawei.com</email>
      </address>
    </author>

    <author fullname="Sami Boutros" initials="S." surname="Boutros"
	role="editor">
      <organization>Cisco Systems</organization>
      <address>
        <email>sboutros@cisco.com</email>
      </address>
    </author>

    <author fullname="Marc Binderberger" initials="M." surname="Binderberger"
	role="editor">
      <organization>Cisco Systems</organization>
      <address>
        <email>mbinderb@cisco.com</email>
      </address>
    </author>

    <author fullname="Jeffrey Haas" initials="J." surname="Haas"
	role="editor">
      <organization>Juniper Networks</organization>
      <address>
        <email>jhaas@juniper.net</email>
      </address>
    </author>
    <date day="1" month="February" year="2012" />

    <abstract>
   <t>This document proposes a mechanism to run BFD on Link Aggregation
   Group (LAG) interfaces.  It does so by running an independent BFD
   async session on every LAG member link.</t>

   <t>The mechanism allows to verify the link connectivity, either in
   combination with or in absence of LACP, with a shorter detection time
   than what LACP offers.  The connectivity check can also cover
   elements of layer 3.</t>

   <t>A dedicated well-known UDP port is introduced that all BFD packets
   over member links use to disambiguate those from ordinary BFD
   packets.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
   <t>The Bidirectional Forwarding Detection (BFD) protocol 
   <xref target="RFC5880" />
   provides a mechanism to detect faults in the bidirectional path
   between two forwarding engines, including interfaces, data link(s),
   and to the extent possible the forwarding engines themselves, with
   potentially very low latency. BFD protocol provides as well a fast 
   mechanism for detecting communication failures on any data links and 
   the protocol can run over any media and at any protocol layer.</t>

   <t>Link aggregation (LAG) as defined in
   <xref target="IEEE802.1AX" /> provides 
   mechanisms to combine multiple physical links to a single logical 
   link. The goal is to provide higher bandwidth with the aggregated 
   logical link and better resiliency, since if one of the physical member 
   links fails, the aggregate logical link can continue to forward traffic 
   over the remaining operational physical member links.</t>

   <t>Currently Link aggregation control protocol (LACP) is used to detect 
   failures on a per physical member link. However, the use of BFD for 
   failure detection would (1) provide a faster detection (2) provide 
   detection in the absence of LACP (3) and would be able to verify L3 
   connectivity per member link.</t>

   <t>Running a single BFD session over the LAG virtual port, and without 
   internal knowledge of the LAG virtual port physical members, will 
   make it impossible for BFD to guarantee a detection of the physical 
   member link failures.</t>

   <t>The goal is to verify every member link connectivity.</t>

   <t>The approach is to run BFD async session over each member link and make 
   BFD control whether the member link should be part of the L2 Loadbalance 
   table of the LAG virtual port in the presence and in the absence of
   LACP.</t>

   <t>This document describes how to establish a BFD async session per physical
   member link of the Link Aggregation (LAG) virtual port.</t>

   <t>While there are native Ethernet mechanisms to detect failures
   (802.1ax, .3ah) that could be used for LAG, the solution proposed in
   this document enables operators who have already deployed BFD over
   different technologies (e.g.  IP, MPLS) to use a common failure
   detection mechanism.</t>
    </section>

    <section title="BFD on LAG member links">
   <t>The mechanism proposed for a fast detection of LAG member link 
   failure is running BFD async sessions on every LAG member link.
   We call these per LAG member link BFD sessions as micro BFD sessions in the rest 
   of this document.</t>

      <section title="Micro BFD async session address family">
   <t>Only one address family MUST be used for all micro BFD async sessions 
   running on all LAG member links. i.e. all member link async BFD 
   sessions MUST either use IPv4 or IPv6.</t>
      </section>

      <section title="Micro BFD async session negotiation">
   <t>A single micro BFD session runs on each member link of the LAG.
   The micro BFD async sessions negotiation MUST follow the same procedures 
   defined in <xref target="RFC5880" /> and
   <xref target="RFC5881" />.</t>

   <t>Only asynchronous mode is considered in this document.  The echo
   function is outside the document's scope.  At least one system MUST
   take the Active role (possibly both).  The micro BFD sessions on the member 
   links are independent BFD sessions.  They use their own unique, 
   local discriminator values, maintain their own set of state variables 
   and have their own independent state machine.  Timer values MAY be 
   different, even among the micro BFD sessions belonging to the same LAG 
   virtual port, although it is expected that micro BFD sessions belonging to 
   the same LAG virtual port use the same timer values.</t>

   <t>The demultiplexing of a received packet is solely based on the Your
   Discriminator field, if this field is nonzero.  For the initial Down
   BFD packets of a BFD session this value may be zero.  In this case
   demultiplexing MUST be based on some combination of other fields
   which MUST include the interface information of the member link.</t>

   <t>When receiving a BFD packet for a micro session with a 
   valid, non-zero Your Discriminator, a check MUST be done if the packet 
   was received on the correct member link interface.  If the check fails
   then the packet MUST be discarded.  This needs to be done before
   state variables for the BFD sessions are updated by the received
   packet.</t>

   <t>The BFD Control packets for each micro BFD session are 
   IP/UDP encapsulated as defined in
   <xref target="RFC5881" />, but with a new UDP 
   destination port "BfdBndlPort" (to be assigned by IANA). Control 
   packets use a destination IP address that is the peer's remote IP 
   address. The details of how this destination IP address is learnt 
   is outside the scope of the document .</t>

   <t>On Ethernet-based LAG member links the destination MAC is a 
    dedicated MAC address (to be assigned by IANA according to
    <xref target="RFC5342" />)  to be the immediate next hop.
   
   This will be used for the initial BFD down packets of a 
   session, while sending BFD Init and Up packets with the MAC address 
   learned from the received BFD packets.</t>

   <t>On Ethernet-based LAG member links the source MAC is the embedded MAC
   address of the port transmitting the packet.</t>

   <t>This mechanism helps to reduce the use of additional MAC addresses,
   which reduces the requirements for the Ethernet hardware on the
   receiving port.</t>
      </section>     
	</section>

    <section title="LAG Management Module">
   <t>The LAG Management Module (LMM) could be envisaged as a client of
   BFD, i.e. the LMM requests a micro BFD session per member link.</t>

   <t>LMM then uses the micro BFD session state, in addition to LACP state, 
   to monitor the health of the individual members links of the LAG.</t>

   <t>The micro BFD session for a particular port MUST be 
   requested when the port is attached to an aggregator.  The session 
   MUST be deleted when the port is detached from the aggregator.</t>

        <section title="BFD in the presence of LACP">
   <t>Once LACP has determined that a link is suitable for aggregation 
   within its selected LAG, and has completed negotiations with the 
   partner device so as to bring that link to Distributing state, at 
   that point the micro BFD session can be started on the link.</t>
   
   <t>BFD, as a layer 3 protocol, is viewed as running across the LAG, with 
   load balancing constraints ensuring particular micro BFD sessions are 
   effectively bound to particular member links.</t>

   <t>Although the link is in LACP Distributing state it should not be used 
   for carrying traffic other than the micro BFD session.  BFD is used to verify 
   that a link is ready to be an active member of its LAG for the purpose 
   of carrying LAG level data traffic.  Only when the micro BFD session is up 
   should the link become active for forwarding general traffic over the 
   bundle.</t>
   
   <t>In all cases where a micro BFD session goes down the corresponding link is 
   removed from the active set of members of the bundle.</t>
   
   <t>LACP remains in Distributing state as BFD runs above the LACP layer but 
   the link is inactive until such time as the micro BFD session is restored.  
   This applies also to the case where a BFD session is started on an 
   existing active link and the BFD session never comes up.</t>
        </section>

		<section title="BFD in the absence of LACP">
   <t>Use of LACP for link aggregation, is strictly optional. It is equally 
   possible to use no aggregation control protocol and to step directly 
   from the layer 1 or layer 2 OAM state becoming operational to starting 
   the micro BFD session.</t>
		</section>

		<section title="Handling Exceptions">
   <t>A possible exception to this would apply if the BFD over LAG feature 
   were provisioned after the link was already active within the LAG.  In 
   such cases it may be desirable to retain the link in its active status 
   to avoid disruption to traffic forwarding while the micro BFD session is brought 
   up.</t>

   <t>Another exception is when the BFD over LAG feature is un-provisioned
   after links were already BFD validated and active within the LAG.  In
   such case, it may be desirable to retain those links in their active
   status to avoid disruption to traffic forwarding since expectation is
   that the peer device will soon un-provision the BFD over LAG
   feature.</t>

   <t>Note that if one device is not operating a micro BFD session on a link, while 
   the other device is and perceives the session to be down, this will result 
   in the two devices having a different view of the status of the link.  
   This would likely lead to traffic loss across the LAG.</t>
   
   <t>The use of another protocol to bootstrap BFD can detect such mismatched 
   config, since the side that's not configured can send a rejection
   error.</t>
		</section>
	</section>

    <section title="BFD on LAG members and layer-3 applications">
    <t>Layer 3 protocols like e.g.  OSPF may use a micro BFD session to 
    detect failures on the LAG virtual port, or may establish a new BFD session 
    over the logical LAG virtual port.</t>
	</section>

    <section title="Detecting a member link failure">
    <t>When a micro BFD session goes down then this member link 
    MUST be taken out of the LAG L2 load balance table.</t>
    </section>

    <section title="Security Consideration">
    <t>This document does not introduce any additional security issues and
    the security mechanisms defined in 
    <xref target="RFC5880" /> apply in this document.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">

    <t>The IANA is requested to assign a well-known port number for the
    UDP encapsulated micro BFD sessions. IANA is also requested to
    assign a dedicated MAC address according to RFC&nbsp;5342
    <xref target="RFC5342" />.</t>
    </section>

   <section anchor="Acknowledgements" title="Acknowledgements">
    <t>We would like to thank Dave Katz, Alexander Vainshtein,
    Greg  Mirsky and Jeff Tantsura for their comments.</t>

    <t>The initial event to start the current discussion was the
    distribution of draft-chen-bfd-interface-00.</t>
    </section>

	<section title="Contributing authors">

    <t>Paul Hitchen
    <vspace blankLines="0" />
	BT
    <vspace blankLines="0" />
    Email: paul.hitchen@bt.com</t>

	<t>George Swallow
    <vspace blankLines="0" />
	Cisco Systems
    <vspace blankLines="0" />
    Email: swallow@cisco.com</t>

	<t>Wim Henderickx
    <vspace blankLines="0" />
	Alcatel-Lucent
    <vspace blankLines="0" />
    Email: wim.henderickx@alcatel-lucent.com</t>

	<t>Nobo Akiya
    <vspace blankLines="0" />
	Cisco Systems
    <vspace blankLines="0" />
    Email: nobo@cisco.com</t>

	<t>Neil Ketley
    <vspace blankLines="0" />
	Cisco Systems
    <vspace blankLines="0" />
    Email: nketley@cisco.com</t>

	<t>Carlos Pignataro
    <vspace blankLines="0" />
	Cisco Systems
    <vspace blankLines="0" />
    Email: cpignata@cisco.com</t>

	<t>Nitin Bahadur
    <vspace blankLines="0" />
	Juniper Networks
    <vspace blankLines="0" />
    Email: nitinb@juniper.net</t>

	<t>Zuliang Wang
    <vspace blankLines="0" />
	Huawei Technologies
    <vspace blankLines="0" />
    Email: liang_tsing@huawei.com</t>

	<t>Liang Guo
    <vspace blankLines="0" />
	China Telecom
    <vspace blankLines="0" />
    Email: guoliang@gsta.com</t>

	<t>Jeff Tantsura
    <vspace blankLines="0" />
	Ericsson
    <vspace blankLines="0" />
    Email: jeff.tantsura@ericsson.com</t>

	</section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.5880"?>

      <?rfc include="reference.RFC.5881"?>

      <?rfc include="reference.RFC.5882"?>
    </references>

    <references title="Informative References">
      <?rfc include="reference.RFC.5342"?>

      <reference anchor="IEEE802.1AX">
        <front>
          <title>IEEE Standard for Local and metropolitan area
		  networks - Link Aggregation
          </title>
		  <author> <organization>IEEE Std. 802.1AX</organization> </author>
          <date month="November" year="2008" />
        </front>
      </reference>

    </references>


  </back>
</rfc>

