CATS Z. Du Internet-Draft K. Yao Intended status: Informational China Mobile Expires: 4 September 2025 G. Huang ZTE Z. Fu New H3C Technologies 3 March 2025 Default Policy and Related Metrics in CATS draft-du-cats-default-policy-and-metrics-00 Abstract This document describes the considerations and requirements of the computing information that needs to be notified into the network in Computing-Aware Traffic Steering (CATS). Especially, it suggests that a default policy should be supported on the Ingress, and one or more default metrics should be included in the notification. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 4 September 2025. Copyright Notice Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components Du, et al. Expires 4 September 2025 [Page 1] Internet-Draft Default Policy and Related Metrics March 2025 extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 3 3. General Procedure of Computing-Aware Traffic Steering . . . . 3 4. Requirements of Computing Resource Modeling . . . . . . . . . 4 5. Design principle of Computing Metrics . . . . . . . . . . . . 5 6. Computing Metric Considerations in CATS . . . . . . . . . . . 6 7. Default Policy Discussion On Decision Point . . . . . . . . . 7 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 9 12. Informative References . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 1. Introduction Computing-Aware Traffic Steering (CATS) is proposed to support steering the traffic among different service sites according to both the real-time network and computing resource status as mentioned in [I-D.ietf-cats-usecases-requirements]. It requires the network to be aware of computing resource information and select a service instance based on the joint metric of computing and networking. The basic idea of CATS is that the network can be notified about the computing information, and change the steering policy accordingly. It is the first question that what should be aware of by the network in the system. There are many computing information in the computing domain, and we need to select one or several of them as the default ones. This document describes the considerations and requirements of the computing information designation, and suggests a two-stage strategy for the metric definition. Firstly, we can define some default metrics that is necessary for enabling the basic steering mechanism in CATS. Afterwards, other metrics can be added as needed, which requires that the metric designation should be extensible. Du, et al. Expires 4 September 2025 [Page 2] Internet-Draft Default Policy and Related Metrics March 2025 The following paragraphs are organized as below. We firstly review the procedure of traffic steering in CATS, and show where the computing information is needed. After that, we analyses some requirements and the principles of computing metric designation. Then, we describe some computing metric considerations in CATS. Finally, we discuss the default policy on the decision point. 2. Definition of Terms This document makes use of the following terms: Computing-Aware Traffic Steering (CATS): A traffic engineering approach [I-D.ietf-teas-rfc3272bis] that takes into account the dynamic nature of computing resources and network state to optimize service-specific traffic forwarding towards a given service contact instance. Various relevant metrics may be used to enforce such computing-aware traffic steering policies. Service: An offering that is made available by a provider by orchestrating a set of resources (networking, compute, storage, etc.). Service instance: An instance of running resources according to a given service logic. Service identifier: Used to uniquely identify a service, at the same time identifying the whole set of service instances that each represents the same service behavior, no matter where those service instances are running. Computing Capability: The ability of nodes with computing resource achieve specific result output through data processing, including but not limited to computing, communication, memory and storage capability. 3. General Procedure of Computing-Aware Traffic Steering It is assumed that the same service can be provided in multiple places in the CATS framework. In the different service instances, it is common that they have different kinds of computing resources, and different utilization rate of the computing resources. In the CATS framework, the decision point, which is supposed to be a node in the network, should be aware of the network status and the computing status, and accordingly choose a proper service point for the client. Du, et al. Expires 4 September 2025 [Page 3] Internet-Draft Default Policy and Related Metrics March 2025 A general procedure to steer the CATS traffic is described as below. The CATS packets have an anycast destination address, which is also called the service ID, and it is announced by the different service points. Firstly, the service points need to collect some specific computing information that need to be sent into the network following a uniform format so that the decision point can understand the computing information. In this step, only necessary computing information needs to be considered, so as to avoid exposing too much information of the service points. Secondly, the service points send the computing information into the network by some means, and update it periodic or on demand. Thirdly, the decision point receives the computing information, and makes a decision for the specific service related to the service ID. Hence, the route for the service ID on the Ingress is established or updated. Fourthly, the traffic for the service ID reaching the Ingress node would be identified and steered according to the route policy in the previous step. Fifthly, the Egress node receives the packets of the traffic for the service ID from the Ingress, and traffic would be identified and forwarded to the corresponding service point. The scheduling of traffic in CATS is similar to the task allocation and scheduling in cloud computing. Normally, it is a multi-objective optimization. In task allocation strategies of cloud computing systems, the main optimization objectives considered are task execution time, execution cost, and load balancing. Execution time and execution cost are used to evaluate the user satisfaction, and load balancing is used to measure the reliability and availability of the system. 4. Requirements of Computing Resource Modeling After the computing metrics are generated, they are notified to the decision point in the network to influence the traffic steering. However, the decision point in the network, for example the Ingress Node, only cares about how to use to capability values to do the traffic steering, but does not care about the way how the capability values are generated. Du, et al. Expires 4 September 2025 [Page 4] Internet-Draft Default Policy and Related Metrics March 2025 From the aspect of services, they need an evaluating system to generate one or more capability values. To achieve the best LB result, different services or service types may have different ways to evaluate the capability. From the aspect the decision point in the network, it only needs to understand the way to use the values, and trigger the implement of the related policy. Some requirements related to the design of the computing resource modeling are listed below. 1. The optimization objective of the policy in the decision point may be various. For example, it may be the lowest latency of the sum of the network delay and the computing delay, or it may be an overall better load balance result, in which the clients would prefer the service points that could support more clients. 2. The update frequency of the computing metrics may be various. Some of the metrics may be more dynamic, and some are relatively static. 3. The notification ways of the computing metrics may be various. According to its update frequency, we may choose different ways to update the metric. 4. Metric merging process should be supported when multiple service instances are behind the same Egress. 5. Design principle of Computing Metrics The target in CATS mainly concerns about the service point selection and traffic steering in Layer3, in which we do not need all computing information of the service points. Hence, we can start with simple cases in the work of the computing resource modeling in CATS. Some design principles can be considered. 1. Simplicity: The computing metrics in CATS SHOULD be few and simple, so as to avoid exposing too much information of the service points. 2. Scalability: The computing metrics in CATS SHOULD be evolveable for the future extensions. 3. Interoperability: The computing metrics in CATS SHOULD be vendor- independent, and OS-independent. Du, et al. Expires 4 September 2025 [Page 5] Internet-Draft Default Policy and Related Metrics March 2025 4. Stability: The computing metrics in CATS SHOULD NOT incur too much overhead in protocol design, and it can be stabilized to be used. 5. Accuracy: The computing metrics in CATS SHOULD be effective for path selection decision making, and the accuracy SHOULD be guaranteed. 6. Computing Metric Considerations in CATS Various metrics can be considered in CATS, and perhaps different services would need different metrics. However, we can start with simple cases. In CATS, a straightforward intent is to minimal the total delay in the network domain and the computing domain. Thus, we can have a start point for the metric designation in CATS considering only the delay information. In this case, the decision point can collect the network delay and the computing delay, and make a decision about the optimal service point accordingly. The advantage of this method is that it is simple and easy to start; meanwhile, the network metric and the computing metric have the same unit of measure. The network delay can be the latency between the Ingress node and Egress node in the network. The computing delay can be generated by the server, which is the delay that server in the service point to process the CATS service. It means “the estimate of the duration of my processing of request”, and it is usually an average value for the service request. The optimization objective of traffic steering in this scenario is the minimal total delay for the client. Another metric that can be considered is the service capability, which is the ability that the server in the service point to process the CATS service. For example, one server can support 100 simultaneous sessions and another can support 10,000 simultaneous sessions. The value can be generated by the server when deploying the service instance. The metric can work alone. In this scenario, the decision point can do a Load Balance job according to the service capability. For example, the decision process can be load balancing after pruning the service points with poor network latency metrics. Also, the metric can work with the computing delay metric. For example, in this scenario, we can prune the service points with poor total latency metrics before the load balancing. In future, we can also consider other metrics, which may be more dynamic. Besides, for some other optimization objectives, we can consider other metrics, even metrics about energy consumption. Du, et al. Expires 4 September 2025 [Page 6] Internet-Draft Default Policy and Related Metrics March 2025 However, in this cases, the decision point needs to consider more dimensions of metrics. A suggestion is that we should firstly make sure the service point is available, which means the service point can still accept more sessions, and then select an optimal target service point according to the optimization objective. 7. Default Policy Discussion On Decision Point In this section, for the convenience of description, we assume that the decision point is on the Ingress, and the Egress would gather the computing metrics of the corresponding service points and notify the Ingress. After receiving the notification about the computing metrics, the Ingress can generate one or more routing policies accordingly for forwarding the packets of CATS traffic from the user. In this document, we suggest that the routing policies should include a default one at least. The packets of CATS traffic received by the Ingress should have an anycast IP as the destination address, which will trigger the routing policy pre-configured. To enable the basic cooperation in CATS, we need one or a set of default computing metrics to be notified into the network. All the CATS Ingresses need to understand the default metrics and trigger the same or similar operations inside the router. It is to say that Ingress should enable some default routing policy after receiving the default metrics. The detailed procedures inside the Ingresses are vendor-specific, for example, which default policy is enabled for a specific service on the router. By comparison, other metrics would be optionally, although perhaps they can obtain a better or more preferred LB result than the default ones. If the Ingress receives the additional metrics and can understand them, it can use the optional metrics to update the forwarding policy for the routes of the anycast IP. There are two kinds of forwarding treatments on the Ingress. Although they are implementations inside the equipment, we give a general description about them here, because they are related to the default metric selection. The first one is that the Ingress will deploy several routes for the anycast IP, but among them only one is active, and others are for backup and are set to inactive. The second one it that the Ingress can have multiple active routes for the anycast IP, and each route has a dedicated weight, so that a load balancing can be done within the Ingress. Du, et al. Expires 4 September 2025 [Page 7] Internet-Draft Default Policy and Related Metrics March 2025 The advantage of the first one is that it can select a best service instance for the client according to the network and computing status. However, its disadvantage is that the Ingress will forward all the traffic of the new clients to a single service point before the policy is updated, which will potentially cause the service point to become busy. For the second one, it may achieve a better LB result. If the first one is used for a specific service, the routing policy on the Ingress may be the minimum total delay first. In this case, the Ingress needs to collect the network delay and the computing delay, and add them together to select the best Egress. If the second one is used for a specific service, the routing policy on the Ingress may be load balancing based on service capability. An initial proposal of the default metrics for the default policies is that we can always send the two metrics mentioned in the last paragraph, i.e., the computing delay and the service capability. At least one of them should be valid. The bits of the computing delay or the service capability are set to all "zero" will be considered invalid, and other values are considered valid. Meanwhile, the bits of the computing delay or the service capability are set to all "one" stands for the service point is temporary busy, and the Ingress should not send new clients to that service point. Alternatively, we can also add another simple metric to indicate the busy or not status. However, this metric is relatively more dynamic than the former two. If the Ingress receives the busy indicator from an Egress, it can change the routing policy accordingly. For example, it can change the path to the related Egress into an inactive one, or change the weight of the path to the related Egress into zero. After the change, the path to the related Egress will not be used for forwarding the CATS traffic from any new clients, i.e., it is deleted from the routing policies temporarily. After a certain time, which can be pre-configured, or receiving a new busy or not indication, the Ingress would update the status or the weight of the path to the related Egress. Thus, the path to the related Egress will be added into the routing policies again. 8. Security Considerations TBD. 9. IANA Considerations TBD. Du, et al. Expires 4 September 2025 [Page 8] Internet-Draft Default Policy and Related Metrics March 2025 10. Acknowledgements The author would like to thank Adrian Farrel, Joel Halpern, Tony Li, Thomas Fossati, Dirk Trossen, Linda Dunbar for their valuable suggestions to this document. 11. Contributors The following people have substantially contributed to this document: Yuexia Fu China Mobile fuyuexia@chinamobile.com Jing Wang China Mobile wangjingjc@chinamobile.com Peng Liu China Mobile liupengyjy@chinamobile.com Wenjing Li Beijing University of Posts and Telecommunications wjli@bupt.edu.cn Lanlan Rui Beijing University of Posts and Telecommunications llrui@bupt.edu.cn 12. Informative References [I-D.ietf-cats-usecases-requirements] Yao, K., Contreras, L. M., Shi, H., Zhang, S., and Q. An, "Computing-Aware Traffic Steering (CATS) Problem Statement, Use Cases, and Requirements", Work in Progress, Internet-Draft, draft-ietf-cats-usecases-requirements-06, 14 February 2025, . [I-D.ietf-teas-rfc3272bis] Farrel, A., "Overview and Principles of Internet Traffic Engineering", Work in Progress, Internet-Draft, draft- ietf-teas-rfc3272bis-27, 12 August 2023, . Du, et al. Expires 4 September 2025 [Page 9] Internet-Draft Default Policy and Related Metrics March 2025 Authors' Addresses Zongpeng Du China Mobile No.32 XuanWuMen West Street Beijing 100053 China Email: duzongpeng@foxmail.com Kehan Yao China Mobile No.32 XuanWuMen West Street Beijing 100053 China Email: yaokehan@chinamobile.com Guangping Huang ZTE Email: huang.guangping@zte.com.cn Zhihua Fu New H3C Technologies Email: fuzhihua@h3c.com Du, et al. Expires 4 September 2025 [Page 10]