<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>

<rfc xmlns:xi="http://www.w3.org/2001/XInclude"
     category="std"
     docName="draft-yang-srv6-precision-flow-control-00"
     ipr="trust200902"
     submissionType="IETF"
     consensus="true"
     version="3">

  <front>
    <title abbrev="SRv6 Precision Flow Control">
      Flow-Level Precision Congestion Control for SRv6 Networks
    </title>

    <seriesInfo name="Internet-Draft" value="draft-yang-srv6-precision-flow-control-00"/>
    <author fullname="Jin Yang" initials="J." surname="Yang">
      <organization>China Mobile</organization>
      <address>
        <postal>
          <street/>
          <city>Beijing</city>
          <code>100053</code>
          <country>China</country>
        </postal>
        <email>yangjinwl@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Weiqiang Cheng" initials="W." surname="Cheng">
      <organization>China Mobile</organization>
      <address>
        <postal>
          <street/>
          <city>Beijing</city>
          <code>100053</code>
          <country>China</country>
        </postal>
        <email>chengweiqiang@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Ming Zhou" initials="M." surname="Zhou">
      <organization>China Mobile</organization>
      <address>
        <postal>
          <street/>
          <city>Beijing</city>
          <code>100053</code>
          <country>China</country>
        </postal>
        <email>zhoumingyjy@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Junjie Wang" initials="J." surname="Wang">
      <organization>Centec</organization>
      <address>
        <postal>
          <street/>
          <city>Suzhou</city>
          <code>215000</code>
          <country>China</country>
        </postal>
        <email>wangjj@centec.com</email>
      </address>
    </author>

    <author fullname="Guoying Zhang" initials="G." surname="Zhang">
      <organization>Centec</organization>
      <address>
        <postal>
          <street/>
          <city>Suzhou</city>
          <code>215000</code>
          <country>China</country>
        </postal>
        <email>zhanggy@centec.com</email>
      </address>
    </author>

    <date year="2026" month="March" day="1"/>

    <area>Routing</area>
    <workgroup>SPRING Working Group</workgroup>

    <keyword>SRv6</keyword>
    <keyword>Congestion Control</keyword>
    <keyword>PFC</keyword>
    <keyword>Lossless Network</keyword>

    <abstract>
      <t>
        This document defines a flow-level precision congestion control
        mechanism for SRv6 networks. The mechanism specifies new congestion 
        notification message formats that enable per-flow congestion 
        information delivery and hop-by-hop backpressure control. Compared 
        to traditional Priority-based Flow Control (PFC) which operates 
        at the queue level, this mechanism provides finer-grained congestion 
        control suitable for Wide-Area Network (WAN) environments, mitigating 
        head-of-line blocking, congestion spreading, and deadlock issues. 
        The document also describes interoperability models with traditional 
        IEEE 802.1Qbb PFC.
      </t>
    </abstract>
  </front>

  <middle>
    <section numbered="true" toc="default">
      <name>Introduction</name>
      <t>
        With the exponential growth of intelligent computing services, scenarios 
        such as distributed AI training, Remote Direct Memory Access (RDMA) over 
        Converged Ethernet (RoCEv2), and disaggregated storage-compute 
        architectures require rigorous lossless transmission of large volumes 
        of bursty traffic. As these services expand beyond data centers across 
        Wide-Area Networks (WANs), maintaining zero-packet-loss guarantees 
        becomes increasingly challenging.
      </t>
      <t>
        Traditional Priority-based Flow Control (PFC), as defined in IEEE
        802.1Qbb, is a Data Link Layer flow control mechanism primarily designed 
        for intra-data center networks. When applied to WAN scenarios with higher 
        Bandwidth-Delay Products (BDP), PFC faces severe structural limitations:
      </t>
      <ul spacing="normal">
        <li>
          High Propagation Latency: WAN transmission delays are
          orders of magnitude larger than those in data center networks. 
          The propagation time required for a PFC PAUSE frame to reach the 
          upstream node often results in severe buffer overflows at the congestion point.
        </li>
        <li>
          Coarse Control Granularity: PFC operates globally at the priority queue
          level. A congestion event triggered by a single micro-burst will 
          cause all flows mapped to that Traffic Class (TC) to be paused, 
          leading to the "collateral damage" known as Head-of-Line (HOL) blocking.
        </li>
        <li>
          Deadlock Vulnerability: In complex topologies involving cyclic routing 
          or prolonged congestion, the hop-by-hop queue-level pause nature of PFC 
          frequently leads to unrecoverable cyclic buffer dependencies, i.e., PFC Deadlocks.
        </li>
      </ul>
      <t>
        To address these limitations, this document proposes a Flow-Level Precision 
        Congestion Control mechanism. Operating within SRv6 networks, it allows 
        network nodes to uniquely identify congested IP flows and explicitly 
        signal upstream nodes to enforce granular rate reduction or pause actions 
        exclusively on the offending flows.
      </t>
      
      <section numbered="true" toc="default">
        <name>Requirements Language</name>
        <t>
          The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
          "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
          "MAY", and "OPTIONAL" in this document are to be interpreted as
          described in BCP 14 <xref target="RFC2119" format="default"/>
          <xref target="RFC8174" format="default"/> when, and only when,
          they appear in all capitals, as shown here.
        </t>
      </section>
    </section>

    <section numbered="true" toc="default">
      <name>Terminology</name>
      <dl>
        <dt>PFC (Priority-based Flow Control):</dt>
        <dd>
          A Link Layer flow control mechanism defined in IEEE 802.1Qbb
          that pauses transmission of a specific priority queue on a link.
        </dd>
        <dt>Stream ID:</dt>
        <dd>
          An identifier locally or globally allocated by network nodes to uniquely 
          distinguish an upper-layer micro-flow within the SRv6 routing domain.
        </dd>
        <dt>PFCM (Precision Flow Control Message):</dt>
        <dd>
          A newly defined IPv6 signaling message (either an ICMPv6 message or 
          an IPv6 Extension Header) used to convey per-flow backpressure signals.
        </dd>
        <dt>Precision Flow Control Time:</dt>
        <dd>
          The duration for which a targeted congestion control action (e.g., rate 
          reduction or pause) MUST be maintained, measured in microseconds.
        </dd>
      </dl>
    </section>

    <section numbered="true" toc="default">
      <name>Protocol Operations</name>

      <section numbered="true" toc="default">
        <name>Architecture Overview</name>
        <t>
          The mechanism operates within standard SRv6 data planes. 
          To support Flow-Level Precision Congestion Control, participating routing 
          nodes are REQUIRED to implement the following functional components:
        </t>
        <ul spacing="normal">
          <li>Flow Classification and Stream ID Management</li>
          <li>Per-flow state monitoring and buffer threshold management</li>
          <li>PFCM Generation (Downstream Node)</li>
          <li>PFCM Processing and Enforcement (Upstream Node)</li>
        </ul>
      </section>

      <section numbered="true" toc="default">
        <name>Flow Classification and Stream ID Assignment</name>
        <t>
          Forwarding nodes MUST perform flow classification to distinguish traffic 
          streams. The default classification method SHOULD utilize the IPv6 
          Flow Label (as defined in <xref target="RFC6437"/>) combined with the 
          Source and Destination IPv6 Addresses. 
        </t>
        <t>
          Alternatively, nodes MAY utilize a classic 5-tuple identifier (Source IP, 
          Destination IP, Protocol, Source Port, Destination Port) where payload 
          inspection is feasible. Implementation-specific classifications (such as 
          Deep Packet Inspection for Layer-7 headers or traffic behavioral heuristics) 
          MAY be used but are strictly outside the scope of this standard.
        </t>
        <t>
          Upon detecting a stateful flow, the node allocates a unique <tt>Stream ID</tt>. 
          The Stream ID management strategy can be localized (significant only between 
          two adjacent hops) or globally coordinated (e.g., using an SDN controller 
          across the SRv6 domain).
        </t>
      </section>

      <section numbered="true" toc="default">
        <name>Congestion Detection and Forwarding Behavior</name>
        <t>
          The lifecycle of precision congestion control is defined by the following 
          state machine transitions:
        </t>
        <ol spacing="normal">
          <li>
            <t>Congestion Detection (Local State):</t>
            <t>
              A node actively monitors its egress buffer occupancy for each identified 
              flow. When the instantaneous or average buffer depth for a specific 
              <tt>Stream ID</tt> exceeds a pre-configured high-water mark threshold, 
              the node transitions to the Congested state.
            </t>
          </li>
          <li>
            <t>PFCM Generation (Signaling):</t>
            <t>
              The congested node generates a Precision Flow Control Message (PFCM). 
              The PFCM encapsulates the offending <tt>Stream ID</tt>, the local 
              <tt>Queue ID</tt>, the requested <tt>Action</tt> (e.g., reduce rate 
              by 50%), and the <tt>Precision Flow Control Time</tt>.
            </t>
          </li>
          <li>
            <t>Reverse Path Transmission:</t>
            <t>
              The PFCM is transmitted to the directly connected upstream node from 
              which the congested flow was received. The PFCM SHOULD be routed to 
              the upstream neighbor's Link-Local IPv6 address.
            </t>
          </li>
          <li>
            <t>Upstream Enforcement (Backpressure):</t>
            <t>
              Upon reception of a PFCM, the upstream node parses the <tt>Stream ID</tt> 
              and maps it to its local forwarding state. It MUST immediately apply the 
              specified <tt>Action</tt> for the duration of the <tt>Precision Flow Control Time</tt>. 
              If the upstream node cannot absorb the backpressure locally, it MAY 
              recursively generate a new PFCM to its own upstream node.
            </t>
          </li>
        </ol>
      </section>

      <section numbered="true" toc="default">
        <name>Interoperability with Legacy L2 PFC</name>
        <t>
          Heterogeneous networks may contain legacy devices incapable of L3 per-flow 
          control. To ensure seamless backward compatibility, a border node receiving 
          a PFCM MAY translate the L3 signaling into an IEEE 802.1Qbb L2 PFC frame.
        </t>
        <t>
          In such translation operations:
        </t>
        <ul spacing="normal">
          <li>
            The <tt>Queue ID</tt> field in the PFCM MUST be directly mapped to the 
            corresponding Class of Service (CoS) priority enable vector in the PFC frame.
          </li>
          <li>
            The <tt>Precision Flow Control Time</tt> (microseconds) MUST be quantized 
            and converted into the standard PFC PAUSE quanta value.
          </li>
        </ul>
      </section>
    </section>

    <section numbered="true" toc="default">
      <name>Packet Formats</name>

      <section numbered="true" toc="default">
        <name>IPv6 Extension Header Format</name>
        <t>
          Precision flow control telemetry MAY be carried in an IPv6 Hop-by-Hop 
          Options header or Destination Options header (<xref target="RFC8200"/>). 
          This is highly optimal for in-band telemetry or when piggybacked on 
          reverse-path traffic.
        </t>
        <figure>
          <name>IPv6 Option Format for Precision Flow Control</name>
          <artwork type="ascii-art"><![CDATA[
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Option Type  | Opt Data Len  |     Type      |    Reserved   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Stream ID           |    Queue ID   |     Action    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Precision Flow Ctrl Time   |           Reserved            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                    Destination IPv6 Address                   |
+                 (Original Congested Packet)                   +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                      Source IPv6 Address                      |
+                 (Original Congested Packet)                   +
|                                                               |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
        </figure>
        <t>
          The fields are defined as follows:
        </t>
        <dl>
          <dt>Option Type (8 bits):</dt>
          <dd>
            Identifies the precision flow control option. Value TBA by IANA. 
            The highest-order 2 bits SHOULD be set to '00' (skip over if not recognized).
          </dd>
          <dt>Opt Data Len (8 bits):</dt>
          <dd>
            Length of the option data in octets, excluding the Option Type and 
            Opt Data Len fields.
          </dd>
          <dt>Type (8 bits):</dt>
          <dd>
            Sub-type for precision flow control. MUST be set to 0 and reserved 
            for future versioning.
          </dd>
          <dt>Stream ID (16 bits):</dt>
          <dd>
            The flow identifier causing congestion.
          </dd>
          <dt>Queue ID (8 bits):</dt>
          <dd>
            The physical or logical priority queue experiencing congestion.
          </dd>
          <dt>Action (8 bits):</dt>
          <dd>
            Specifies the congestion mitigation directive. 
            Bits [0:1] specify the action type: <tt>00</tt> = No Backpressure, 
            <tt>01</tt> = Pause Flow, <tt>10</tt> = Reduce Rate. 
            Bits [2:7] represent the rate reduction ratio as an absolute percentage 
            (0-100) when the action type is <tt>10</tt>.
          </dd>
          <dt>Precision Flow Ctrl Time (16 bits):</dt>
          <dd>
            The temporal duration for the specified action, represented in microseconds.
          </dd>
          <dt>Destination &amp; Source IPv6 Addresses (128 bits each):</dt>
          <dd>
            The IP addresses extracted from the data packet that triggered the 
            congestion event. This allows the upstream node to precisely correlate 
            the telemetry with its local forwarding cache.
          </dd>
        </dl>
      </section>

      <section numbered="true" toc="default">
        <name>ICMPv6 Message Format</name>
        <t>
          Out-of-band signaling utilizes ICMPv6 messages. This mechanism guarantees 
          delivery independent of reverse-path data traffic availability.
        </t>
        <figure>
          <name>ICMPv6 Message Format for Precision Flow Control</name>
          <artwork type="ascii-art"><![CDATA[
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      Type     |      Code     |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Reserved            |           Stream ID           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Queue ID   |     Action    |    Precision Flow Ctrl Time   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                    Destination IPv6 Address                   |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                      Source IPv6 Address                      |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
        </figure>
        <t>
          The ICMPv6 header fields are strictly defined as:
        </t>
        <dl>
          <dt>Type (8 bits):</dt>
          <dd>
            A new ICMPv6 message type assigned by IANA indicating 
            <tt>Precision Flow Control Notification</tt>.
          </dd>
          <dt>Code (8 bits):</dt>
          <dd>
            ICMPv6 message sub-type (0x00 default).
          </dd>
          <dt>Checksum (16 bits):</dt>
          <dd>
            The standard ICMPv6 checksum (<xref target="RFC4443"/>).
          </dd>
        </dl>
      </section>
    </section>

    <section numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>
        The introduction of L3/L4 flow-level pause and backpressure signaling 
        inherently expands the attack surface of the network architecture. 
        Malicious actors could spoof PFCM packets to arbitrarily pause critical 
        infrastructure flows, leading to a severe Denial of Service (DoS) attack.
      </t>
      <t>
        To mitigate these threats, the following security constraints MUST be 
        enforced by compliant implementations:
      </t>
      <ul spacing="normal">
        <li>
          <t>Hop Limit Verification:</t>
          <t>
            When processing an ICMPv6 PFCM, a node MUST verify that the IP Hop Limit 
            is exactly 255. Packets arriving with a smaller Hop Limit MUST be silently 
            discarded, guaranteeing that the signal originated from an immediate neighbor.
          </t>
        </li>
        <li>
          <t>Cryptographic Authentication:</t>
          <t>
            In untrusted or multi-tenant transport domains, the precision flow control 
            messages SHOULD be secured using the IPsec Authentication Header (AH) or 
            Encapsulating Security Payload (ESP) to ensure data integrity and neighbor 
            origin authentication.
          </t>
        </li>
        <li>
          <t>Rate Limiting:</t>
          <t>
            Nodes MUST implement strict control-plane policing (CoPP) and rate limiting 
            for PFCM processing to prevent CPU resource exhaustion attacks.
          </t>
        </li>
      </ul>
    </section>

    <section numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>
        This document requests the following allocations from IANA:
      </t>
      <ol spacing="normal">
        <li>
          A new Option Type in the "Destination Options and Hop-by-Hop Options" 
          registry for the <tt>Precision Flow Control Congestion Notification</tt>.
        </li>
        <li>
          A new Type value in the "ICMPv6 Type Numbers" registry for the 
          <tt>Precision Flow Control Congestion Notification</tt> messages.
        </li>
      </ol>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author initials="S." surname="Bradner" fullname="S. Bradner"/>
            <date month="March" year="1997"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>
        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author initials="B." surname="Leiba" fullname="B. Leiba"/>
            <date month="May" year="2017"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
        <reference anchor="RFC4443" target="https://www.rfc-editor.org/info/rfc4443">
          <front>
            <title>Internet Control Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) Specification</title>
            <author initials="A." surname="Conta" fullname="A. Conta"/>
            <author initials="S." surname="Deering" fullname="S. Deering"/>
            <author initials="M." surname="Gupta" fullname="M. Gupta" role="editor"/>
            <date month="March" year="2006"/>
          </front>
          <seriesInfo name="STD" value="89"/>
          <seriesInfo name="RFC" value="4443"/>
          <seriesInfo name="DOI" value="10.17487/RFC4443"/>
        </reference>
        <reference anchor="RFC8200" target="https://www.rfc-editor.org/info/rfc8200">
          <front>
            <title>Internet Protocol, Version 6 (IPv6) Specification</title>
            <author initials="S." surname="Deering" fullname="S. Deering"/>
            <author initials="R." surname="Hinden" fullname="R. Hinden"/>
            <date month="July" year="2017"/>
          </front>
          <seriesInfo name="STD" value="86"/>
          <seriesInfo name="RFC" value="8200"/>
          <seriesInfo name="DOI" value="10.17487/RFC8200"/>
        </reference>
      </references>
      <references>
        <name>Informative References</name>
        <reference anchor="RFC6437" target="https://www.rfc-editor.org/info/rfc6437">
          <front>
            <title>IPv6 Flow Label Specification</title>
            <author initials="S." surname="Amante" fullname="S. Amante"/>
            <author initials="B." surname="Carpenter" fullname="B. Carpenter"/>
            <author initials="S." surname="Jiang" fullname="S. Jiang"/>
            <author initials="J." surname="Rajahalme" fullname="J. Rajahalme"/>
            <date month="November" year="2011"/>
          </front>
          <seriesInfo name="RFC" value="6437"/>
          <seriesInfo name="DOI" value="10.17487/RFC6437"/>
        </reference>
        <reference anchor="RFC8754" target="https://www.rfc-editor.org/info/rfc8754">
          <front>
            <title>IPv6 Segment Routing Header (SRH)</title>
            <author initials="C." surname="Filsfils" fullname="C. Filsfils" role="editor"/>
            <author initials="D." surname="Dukes" fullname="D. Dukes" role="editor"/>
            <author initials="S." surname="Previdi" fullname="S. Previdi"/>
            <author initials="J." surname="Leddy" fullname="J. Leddy"/>
            <author initials="S." surname="Matsushima" fullname="S. Matsushima"/>
            <author initials="D." surname="Voyer" fullname="D. Voyer"/>
            <date month="March" year="2020"/>
          </front>
          <seriesInfo name="RFC" value="8754"/>
          <seriesInfo name="DOI" value="10.17487/RFC8754"/>
        </reference>
        <reference anchor="RFC8402" target="https://www.rfc-editor.org/info/rfc8402">
          <front>
            <title>Segment Routing Architecture</title>
            <author initials="C." surname="Filsfils" fullname="C. Filsfils" role="editor"/>
            <author initials="S." surname="Previdi" fullname="S. Previdi" role="editor"/>
            <author initials="L." surname="Ginsberg" fullname="L. Ginsberg"/>
            <author initials="B." surname="Decraene" fullname="B. Decraene"/>
            <author initials="S." surname="Litkowski" fullname="S. Litkowski"/>
            <author initials="R." surname="Shakir" fullname="R. Shakir"/>
            <date month="July" year="2018"/>
          </front>
          <seriesInfo name="RFC" value="8402"/>
          <seriesInfo name="DOI" value="10.17487/RFC8402"/>
        </reference>
      </references>
    </references>

    <section numbered="false" toc="default">
      <name>Acknowledgements</name>
      <t>
        The authors would like to thank the contributors and reviewers who
        provided valuable feedback on this document.
      </t>
    </section>
  </back>
</rfc>
