NMOP T. Graf Internet-Draft W. Du Intended status: Informational Swisscom Expires: 15 October 2025 P. Francois A. Huang Feng INSA-Lyon 13 April 2025 A Framework for a Network Anomaly Detection Architecture draft-ietf-nmop-network-anomaly-architecture-02 Abstract This document describes the motivation and architecture of a Network Anomaly Detection Framework and the relationship to other documents describing network Symptom semantics and network incident lifecycle. The described architecture for detecting IP network service interruption is designed to be generic applicable and extensible. Different applications are described and examples are referenced with open-source running code. Discussion Venues This note is to be removed before publishing as an RFC. Discussion of this document takes place on the Operations and Management Area Working Group Working Group mailing list (nmop@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/nmop/. Source for this draft and an issue tracker can be found at https://github.com/ietf-wg-nmop/draft-ietf-nmop-network-anomaly- architecture/ . Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Graf, et al. Expires 15 October 2025 [Page 1] Internet-Draft Network Anomaly Detection Framework April 2025 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 15 October 2025. Copyright Notice Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 2.2. Outlier Detection . . . . . . . . . . . . . . . . . . . . 6 2.3. Knowledge Based Detection . . . . . . . . . . . . . . . . 7 2.4. Data Mesh . . . . . . . . . . . . . . . . . . . . . . . . 7 3. Elements of the Architecture . . . . . . . . . . . . . . . . 8 3.1. Service Inventory . . . . . . . . . . . . . . . . . . . . 10 3.2. SDD Configuration . . . . . . . . . . . . . . . . . . . . 10 3.3. Operational Data Collection . . . . . . . . . . . . . . . 10 3.4. Operational Data Aggregation . . . . . . . . . . . . . . 11 3.5. Service Disruption Detection . . . . . . . . . . . . . . 11 3.6. Alarm . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.7. Postmortem . . . . . . . . . . . . . . . . . . . . . . . 13 3.8. Replaying . . . . . . . . . . . . . . . . . . . . . . . . 14 4. Implementation Status . . . . . . . . . . . . . . . . . . . . 14 4.1. Cosmos Bright Lights . . . . . . . . . . . . . . . . . . 15 5. Security Considerations . . . . . . . . . . . . . . . . . . . 15 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 15 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 8.1. Normative References . . . . . . . . . . . . . . . . . . 15 8.2. Informative References . . . . . . . . . . . . . . . . . 17 Graf, et al. Expires 15 October 2025 [Page 2] Internet-Draft Network Anomaly Detection Framework April 2025 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 1. Introduction Today's highly virtualized large scale IP networks are a challenge for network operation to monitor due to its vast number of dependencies. Humans are no longer capable to verify manually all the dependencies end to end in a timely manner. IP networks are the backbone of today's society. We individually depend on networks fulfilling the purpose of forwarding IP packets from a point A to a point B at any time of the day. A loss of such connectivity for a short period of time has today manyfold implications that can range from minor to severe. An interruption can lead to being unable to browse the web, watch a soccer game, access the company intranet or, even in life threatening situations, no longer being able to reach emergency services. Further, a congestion in the network leading to delayed packet forwarding can lead to severe repercussions on real-time applications. Networks are generally deterministic. However, the usage of networks are only somewhat. Humans, as in a large group of people, are somehow predictable. There are time of the day patterns in terms of when we are eating, sleeping, working or leisure. And these patterns are potentially changing depending on age, profession and cultural background. 1.1. Motivation When operational or configurational changes in connectivity services are happening, it is crucial for network operators to detect interruptions within the network faster than the users utilizing the connectivity services. In order to achieve this objective, automation in network monitoring is required. The amount or people operating the network are today simply outnumbered by the amount of people using connectivity services. Graf, et al. Expires 15 October 2025 [Page 3] Internet-Draft Network Anomaly Detection Framework April 2025 This automation needs to monitor network changes holistically by supervising all 3 network planes simultaneously for a given connectivity service. The system needs to detect whether the operational changes are service disruptive, e.g. the received packets from customers are no longer forwarded to the desired destination, or not. A change in control plane and management plane indicate a network topology change while a change in the forwarding plane describes how the packets are being forwarded. In other words, control and management plane changes can be attributed to network topology State changes whereas forwarding plane is related to the outcome of these network topology State changes. Since changes in networks are happening all the time due to the vast number of dependencies, a scoring system is needed to indicate whether the change is considered disruptive. The scoring system needs to take into account the amount of transport sessions, the amount of affected flows and whether the detected interruptions are usual or exceptional. 1.2. Scope Such objectives can be achieved by applying checks on network modeled time series data that contains semantics describing their dependencies across network planes. These checks can be based on domain knowledge or using outlier detection techniques. Domain- knowledge-based techniques applies the expertise of network engineers operating a network to understand whether there is an issue impacting the customer or not. On the other hand, outlier detection techniques identify measurements that deviate significantly from the norm and therefore are considered anomalous. The described scope does not take the connectivity service intent into account nor does it verify whether the intent is being achieved all the time. Changes to the service intent causing service disruptions are therefore considered service disruptions where on monitoring systems taking the intent into account this is considered as intended. 2. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Graf, et al. Expires 15 October 2025 [Page 4] Internet-Draft Network Anomaly Detection Framework April 2025 2.1. Terminology This document defines the following terms: Outlier Detection: Is a systematic approach to identify rare data points deviating significantly from the majority. Service Disruption Detection (SDD): The process of detecting a service degradation by discovering anomalies in network monitoring data. Service Disruption Detection System (SDDS): A system allowing to perform SDD. Additionally it makes use of the terms defined in [I-D.ietf-nmop-terminology] and [I-D.netana-nmop-network-anomaly-lifecycle]. The following terms are used as defined in [I-D.ietf-nmop-terminology] : * Resource * Event * State * Relevance * Problem * Symptom * Cause * Alarm Figure 2 in Section 3 of [I-D.ietf-nmop-terminology] shows characteristics of observed operational network telemetry metrics. Figure 4 in Section 3 of [I-D.ietf-nmop-terminology] shows relationships between, state, relevant state, problem, symptom, cause and alarm. Figure 5 in Section 3 of [I-D.ietf-nmop-terminology] shows relationships between problem, symptom and cause. Graf, et al. Expires 15 October 2025 [Page 5] Internet-Draft Network Anomaly Detection Framework April 2025 The following terms are used as defined in [I-D.netana-nmop-network-anomaly-lifecycle] : * False Positive * False Negative 2.2. Outlier Detection Outlier Detection, also known as anomaly detection, describes a systematic approach to identify rare data points deviating significantly from the majority. Outliers can manifest as single data point or as a sequence of data points. There are multiple ways in general to classify anomalies, but for the context of this draft, the following three classes are taken into account: Global outliers: An outlier is considered "global" if its behavior is outside the entirety of the considered data set. For example, if the average dropped packet count is between 0 and 10 per minute and, in a small time-window, the value gets to 1000, this data point is considered a global anomaly. Contextual outliers: An outlier is considered "contextual" if its behavior is within a normal (expected) range, but it would not be expected based on some context. Context can be defined as a function of multiple parameters, such as time, location, etc. An example of a contextual outlier is when the forwarded packet volume overnight reaches levels which might be totally normal for the daytime, but anomalous and unexpected for the nighttime. Collective outliers: An outlier is considered "collective" if the behavior of each single data point that are part of the anomaly are within expected ranges (so they are not anomalous in either a contextual or a global sense), but the group, taking all the data points together, is. Note that the group can be made within a single time series (a sequence of data points is anomalous) or across multiple metrics (e.g. if looking at two metrics together, the combined behavior turns out to be anomalous). In Network Telemetry time series, one way this can manifest is that the amount of network paths and interface State changes matches the time range when the forwarded packet volume decreases as a group. For each outlier a score between 0 and 1 is being calculated. The higher the value, the higher the probability that the observed data point is an outlier. Anomaly detection: A survey [VAP09] gives additional details on anomaly detection and its types. Graf, et al. Expires 15 October 2025 [Page 6] Internet-Draft Network Anomaly Detection Framework April 2025 2.3. Knowledge Based Detection Knowledge-based anomaly detection, also known as rule-based anomaly detection, is a technique used to identify anomalies or outliers by comparing them against predefined rules or patterns. This approach relies on the use of domain-specific knowledge to set standards, thresholds, or rules for what is considered "normal" behavior. Traditionally, these rules are established manually by a knowledgeable network engineer. Forward-looking, these rules can be expressed using a human and machine readable expressions using ontologies Section 5.3 of [I-D.mackey-nmop-kg-for-netops]. Such ontologies can be based on RFC-like documents which define network protocol behaviours and derived network Symptoms and patterns. Additionally, in the context of network anomaly detection, the knowledge-based approach works hand in hand with the deterministic understanding of the network, which is reflected in network modeling. Components are organized into three network planes: the Management Plane, the Control Plane, and the Forwarding Plane [RFC9232]. A component can relate to a physical, virtual, or configurational entity, or to a sum of packets belonging to a flow being forwarded in a network. Such relationships can be modelled in a Digital Map to automate that process. [I-D.havel-nmop-digital-map-concept] defines the concepts for the Digital Map and [I-D.havel-nmop-digital-map] defines an application of the Digital Map to network topologies. These relationships can also be modeled in a Knowledge Graph Section 5 of [I-D.mackey-nmop-kg-for-netops] where ontologies can be used to augment the relationships among different network elements in the network model. 2.4. Data Mesh The Data Mesh [Deh22] Architecture distinguishes between operational and analytical data. Operational data refers to collected data from operational systems. While analytical data refers to insights gained from operational data. 2.4.1. Operational Network Data In terms of network observability, semantics of operational network metrics are defined by IETF and are categorized as described in the Network Telemetry Framework [RFC9232] in the following three different network planes: Management Plane: Time series data describing the State changes and Graf, et al. Expires 15 October 2025 [Page 7] Internet-Draft Network Anomaly Detection Framework April 2025 statistics of a network node and its Resources. For example, Interface State and statistics modeled in ietf-interfaces.yang [RFC8343]. Control Plane: Time series data describing the State and State changes of network reachability. For example, BGP VPNv6 unicast updates and withdrawals exported in BGP Monitoring Protocol (BMP) [RFC7854] and modeled in BGP [RFC4364]. Forwarding Plane: Time series data describing the forwarding behavior of packets and its data-plane context. For example, dropped packet count modelled in IPFIX entity forwardingStatus(IE89) [RFC7270] and packetDeltaCount(IE2) [RFC5102] and exportet with IPFIX [RFC7011] . 2.4.2. Analytical Observed Symptoms The Service Disruption Detection process takes operational network data as input and generates analytical metrics describing Symptoms and outlier pattern of the connectivity service disruption. The observed Symptoms are categorized into a semantic triple [W3C-RDF-concept-triples]: action, reason, cause. The object is the action, decribing the change in the network. The reason is the predicate, defining why this changed occured and the subject is the cause, which defines what triggered that change. Symptom definitions are described in Section 3 of [I-D.netana-nmop-network-anomaly-semantics] and outlier pattern semantics in Section 4 of [I-D.netana-nmop-network-anomaly-lifecycle]. Both are expressed in YANG information models. However the semantic triples could also be expressed with the Semantic Web Technology Stack in RDF, RDFS and OWL definitions as described in Section 6 of [I-D.mackey-nmop-kg-for-netops]. Together with the ontology definitions described in Section 2.3, a Knowledge Graph can be created describing the relationship between the network state and the observed Symptom. 3. Elements of the Architecture A system architecture aimed at detecting service disruptions is typically built upon multiple components, for which design choices need to be made. In this section, we describe the main components of the architecture, and delve into considerations to be made when designing such componenents in an implementation. Graf, et al. Expires 15 October 2025 [Page 8] Internet-Draft Network Anomaly Detection Framework April 2025 The system architecture is illustrated in Figure 1 and its main components are described in the following subsections. +---------+ +-------------------+ |Service | | Alarm and | |--- |Inventory| | Problem Management| | | | | System | | +---------+ +-------------------+ | | ^ Stream | | | | | +---------+ +-------------------+ | | | Post- | Stream | Message Broker | | | | mortem | <-------- | with Analytical | | | | System | | Network Data | | | +---------+ +-------------------+ | | | ^ Stream | | | | | | | +-------------------+ | | Profile | Fine | Alarm Aggregation | Store Label | | and | Tune | for Anomaly | ------------| | | Generate | SDD | Detection | | | | SDD Config | Config +-------------------+ | | | | ^ ^ ^ Stream | | v v | | | v | +-------------------+ +-------------------+ +---------+ | | Service Disruption| Schedule | Service Disruption| Replay | Data | | | Detection | ---------> | Detection |<------ | Storage | | | Configuration | Detection | | | | | +-------------------+ +-------------------+ +---------+ | ^ ^ Stream ^ ^ ^ ^ | | | | | | | | +---------+---------+ | | | Network | Data | Store | |----------------------------------> | Model | Aggr. | ------------| | | Process | Operational Data +---------+---------+ ^ ^ ^ Stream | | | +-------------------+ | Message Broker | | with Operational | | Network Data | +-------------------+ ^ ^ ^ Stream Subscribe Publish | | | +-------------------+ +-------------------+ | Network Node with | ------> | Network Telemetry | --------> | Network Telemetry | ------> | Data Collection | Graf, et al. Expires 15 October 2025 [Page 9] Internet-Draft Network Anomaly Detection Framework April 2025 | Subscription | ------> | | +-------------------+ +-------------------+ Figure 1: Service Disruption Detection Architecture 3.1. Service Inventory A service inventory is used to obtain a list of the connectivity services for which Anomaly Detection is to be performed. A service profiling process may be executed on the service in order to define a configuration of the service disruption detection approach and parameters to be used. 3.2. SDD Configuration Based on this service list and potential preliminary service profiling, a configuration of the Service Disruption Detection is produced. It defines the set of approaches that need to be applied to perform SDD, as well as parameters that are to be set when executing the algorithms performing SDD per se. As the service lives on, the configuration may be adapted as a result of an evolution of the profiling being performed. Postmortem analysis are produced as a result of Events impacting the service, or the occurrence of false positives raised by the Alarm system. These postmortem analysis can lead to improvements of the deployed profiles parameters and creation of new customer profiles. 3.3. Operational Data Collection Collection of network monitoring data involves the management of the subscriptions to network telemetry on nodes of the network, and the configuration of the collection infrastructure to receive the monitoring data produced by the network. The monitoring data produced by the collection infrastructure is then streamed through a message broker system, for further processing. Networks tend to produce extremely large amounts of monitoring data. To preserve scaling and reduce costs, decisions need to be made on the duration of retention of such data in storage, and at which level of storage they need to be kept. A retention time need to be set on the raw data produced by the collection system, in accordance to their utility for further used. This aspect will be elaborated in further sections. Graf, et al. Expires 15 October 2025 [Page 10] Internet-Draft Network Anomaly Detection Framework April 2025 3.4. Operational Data Aggregation Aggregation is the process of producing data upon which detection of a service disruption can be performed, based on collected network monitoring data. Pre-processing of collected network monitoring data is usually performed so as to produce input for the Service Disruption Detection component. This can be achieved in multiple ways, depending on the architecture of the SDD component. As an example, the granularity at which forwarding data is produced by the network may be too high for the SDD algorithms, and instead be aggregated into a coarser dimension for SDD execution. A retention time also needs to be decided upon for Aggregated data. Note that the retention time must be set carefully, in accordance with the replay ability requirement discussed in Section 3.8. 3.5. Service Disruption Detection Service Disruption Detection processes the aggregated network data in order to decide whether a service is degraded to the point where network operation needs to be alerted of an ongoing Problem within the network. Two key aspects need to be considered when designing the SDD component. First, the way the data is being processed needs to be carefully designed, as networks typically produce extremely large amounts of data which may hinder the scalability of the architecture. Second, the algorithms used to make a decision to alert the operator need to be designed in such a way that the operator can trust that a targeted Service Disruption will be detected (no false negatives), while not spamming the operator with Alarms that do not reflect an actual issue within the network (false positives) leading to Alarm fatigue. Two approaches are typically followed to present the data to the SDD system. Classically, the aggregated data can be stored in a database that is polled at regular intervals by the SDD component for decision making. Alternatively, a streaming approach can be followed so as to process the data while they are being consumed from the collection component. For SDD per-se, two families of algorithms can be decided upon. First, knowledge based detection approaches can be used, mimicking the process that human operators follow when looking at the data. Machine Learning based outlier detection approaches to detect deviations from the norm. Graf, et al. Expires 15 October 2025 [Page 11] Internet-Draft Network Anomaly Detection Framework April 2025 3.5.1. Network Modeling Some input to SDD is made of established knowledge of the network that is unrelated to the dimensions according to which outlier detection is performed. For example, the knowledge of the network infrastructure may be required to perform some service disruption detection. Such data need to be rendered accessible and updatable for use by SDD. They may come from inventories, or automated gathering of data from the network itself. 3.5.2. Data Profiling As rules cannot be crafted specifically for each customer, they need to be defined according to pre-established service profiles. Processing of monitoring data can be performed in order to associate each service with a profile. External knowledge on the customer can also help in associating a service with a profile. 3.5.3. Detection Strategies For a profile, a set of strategies is defined. Each strategy captures one approach to look at the data (as a human operator does) to observe if an abnormal situation is arising. Strategies are defined as a function of observed outliers as defined in Section 2.2. When one of the strategies applied for a profile detects a concerning global outlier or collective outlier, an Alarm needs to be raised. Depending on the implementation of the architecture, a scheduler may be needed in order to orchestrate the evaluation of the Alarm levels for each strategy applied for a profile, for all service instances associated with such profile. 3.5.4. Machine Learning Machine learning-based anomaly detection can also be seamlessly integrated into such SDDS. Machine learning is commonly used for detecting outliers or anomalies. Typically, unsupervised learning is widely recognized for its applicability, given the inherent characteristics of network data. Although machine learning requires a sizeable amount of high-quality data and considerable advanced training, the advantages it offers make these requirements worthwhile. The power of this approach lies in its generalizability, robustness, ability to simplify the fine-tuning process, and most importantly, its capability to identify anomaly patterns that might go unnoticed to the human observer. Graf, et al. Expires 15 October 2025 [Page 12] Internet-Draft Network Anomaly Detection Framework April 2025 3.5.5. Storage Storage may be required to execute SDD, as some algorithms may be relying on historical (aggregated) monitoring data in order to detect anomalies. Careful considerations need to be made on the level at which such data is stored, as slow access to such data may be detrimental to the reactivity of the system. 3.6. Alarm When the SDD component decides that a service is undergoing a disruption, a relevant-state change notification needs to be sent to the Alarm and Problem management system as shown in Figure 4 in Section 3 of [I-D.ietf-nmop-terminology]. Multiple practical aspects need to be taken into account in this component. When the issue lasts longer than the interval at which the SDD component runs, the relevant-state change mechanism should not create multiple notifications to the operator, so as to not overwhelm the management of the issue. However, the information provided along with the Alarm should be kept up to date during the full duration of the issue. 3.7. Postmortem Network Anomaly Detection Symptoms +-------------------+ & | +-----------+ | Network Anomalies | | Detection |---|-------------+ | | Stage | | | | +-----------+ | v +---------^---------+ +-------------------+ Labels +------------+ | | Anomaly Detection |---------------->| Validation | | | Label Store |<----------------| Stage | | +-------------------+ Revised +------------+ +------------+ | Labels | Refinement | | | Stage |<----------------+ +------------+ Historical Symptoms & Network Anomalies Figure 2: Anomaly Detection Refinement Lifecycle Validation and refinement are performed during Postmortem analysis. Graf, et al. Expires 15 October 2025 [Page 13] Internet-Draft Network Anomaly Detection Framework April 2025 From an Anomaly Detection Lifecycle point of view, as described in [I-D.netana-nmop-network-anomaly-lifecycle], the Service Disruption Detection Configuration evolves over time, iteratively, looping over three main phases: detection, validation and refinement. The Detection phase produces the Alarms that are sent to the Alarm and Problem Management System and at the same time it stores the network anomaly and Symptom labels into the Label Store. This enables network engineers to review the labels to validate and edit them as needed. The Validation stage is typically performed by network engineers reviewing the results of the detection and indicating which Symptoms and network anomalies have been useful for the identification of Problems in the network. The original labels from the Service Disruption Detection are analyzed and an updated set of more accurate labels is provided back to the label store. The resulting labels will be then provided back into the Network Anomaly Detection via its refinement capabilities: the refinement is about the update of the Service Disruption Detection configuration in order to improve the results of the detection (e.g. false positives, false negatives, accuracy of the boundaries, etc.). 3.8. Replaying When a service disruption has been detected, it is essential for the human operator to be able to analyze the data which led to the raising of an Alarm. It is thus important that a SDDS preserves both the data which led to the creation of the Alarm as well as human understandable information on why the data led to the raising of an Alarm. In early stages of operations or when experimenting with a SDDS, it is common that the parameters used for SDD are to be fined tuned. This process is facilitated by designing the SDDS architecture in a way that allows to rerun the SDD algorithms on the same input. Data retention, as well as its level, need to be defined in order not to sacrifice the ability of replaying SDD execution for the sake of improving its accuracy. 4. Implementation Status Note to the RFC-Editor: Please remove this section before publishing. This section records the status of known implementations. Graf, et al. Expires 15 October 2025 [Page 14] Internet-Draft Network Anomaly Detection Framework April 2025 4.1. Cosmos Bright Lights This architecture have been developed as part of a proof of concept started in September 2022 first in a dedicated network lab environment and later in December 2022 in Swisscom production to monitor a limited amount of 16 L3 VPN connectivity services. At the Applied Networking Research Workshop at IRTF 117 the architecture was the first time published in the following academic paper: [Ahf23]. Since December 2022, 20 connectivity service disruptions have been monitored and 52 false positives due to time series database temporarily not being real-time and missing traffic profiling, comparing to previous week was not applicable, occurred. Out of 20 connectivity service disruptions 6 parameters where monitored and 3 times 1, 8 times 2, 6 times 3, 2 times 4 parameters recognized the service disruption. A real-time streaming based version has been deployed in Swisscom production as a proof of concept in June 2024 monitoring approximate >12'000 L3 VPN's concurrently. Improved profiling capabilities are currently under development. 5. Security Considerations TBD 6. Contributors The authors would like to thank Alex Huang Feng, Ahmed Elhassany and Vincenzo Riccobene for their valuable contribution. 7. Acknowledgements The authors would like to thank Qin Wu, Ignacio Dominguez Martinez- Casanueva and Adrian Farrel for their review and valuable comments. 8. References 8.1. Normative References Graf, et al. Expires 15 October 2025 [Page 15] Internet-Draft Network Anomaly Detection Framework April 2025 [I-D.havel-nmop-digital-map] Havel, O., Claise, B., de Dios, O. G., Elhassany, A., and T. Graf, "Modeling the Digital Map based on RFC 8345: Sharing Experience and Perspectives", Work in Progress, Internet-Draft, draft-havel-nmop-digital-map-02, 21 October 2024, . [I-D.havel-nmop-digital-map-concept] Havel, O., Claise, B., de Dios, O. G., and T. Graf, "Digital Map: Concept, Requirements, and Use Cases", Work in Progress, Internet-Draft, draft-havel-nmop-digital-map- concept-00, 4 July 2024, . [I-D.ietf-nmop-terminology] Davis, N., Farrel, A., Graf, T., Wu, Q., and C. Yu, "Some Key Terms for Network Fault and Problem Management", Work in Progress, Internet-Draft, draft-ietf-nmop-terminology- 15, 30 March 2025, . [I-D.mackey-nmop-kg-for-netops] Mackey, M., Claise, B., Graf, T., Keller, H., Voyer, D., Lucente, P., and I. D. Martinez-Casanueva, "Knowledge Graph Framework for Network Operations", Work in Progress, Internet-Draft, draft-mackey-nmop-kg-for-netops-02, 4 March 2025, . [I-D.netana-nmop-network-anomaly-lifecycle] Riccobene, V., Roberto, A., Graf, T., Du, W., and A. H. Feng, "An Experiment: Network Anomaly Lifecycle", Work in Progress, Internet-Draft, draft-netana-nmop-network- anomaly-lifecycle-05, 3 November 2024, . [I-D.netana-nmop-network-anomaly-semantics] Graf, T., Du, W., Feng, A. H., Riccobene, V., and A. Roberto, "Semantic Metadata Annotation for Network Anomaly Detection", Work in Progress, Internet-Draft, draft- netana-nmop-network-anomaly-semantics-04, 3 November 2024, . Graf, et al. Expires 15 October 2025 [Page 16] Internet-Draft Network Anomaly Detection Framework April 2025 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC9232] Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, May 2022, . 8.2. Informative References [Ahf23] Huang Feng, A., "Daisy: Practical Anomaly Detection in large BGP/MPLS and BGP/SRv6 VPN Networks", IETF 117, Applied Networking Research Workshop, DOI 10.1145/3606464.3606470, July 2023, . [Deh22] Dehghani, Z., "Data Mesh", O'Reilly Media, ISBN 9781492092391, March 2022, . [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, . [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. Meyer, "Information Model for IP Flow Information Export", RFC 5102, DOI 10.17487/RFC5102, January 2008, . [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information", STD 77, RFC 7011, DOI 10.17487/RFC7011, September 2013, . [RFC7270] Yourtchenko, A., Aitken, P., and B. Claise, "Cisco- Specific Information Elements Reused in IP Flow Information Export (IPFIX)", RFC 7270, DOI 10.17487/RFC7270, June 2014, . Graf, et al. Expires 15 October 2025 [Page 17] Internet-Draft Network Anomaly Detection Framework April 2025 [RFC7854] Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP Monitoring Protocol (BMP)", RFC 7854, DOI 10.17487/RFC7854, June 2016, . [RFC8343] Bjorklund, M., "A YANG Data Model for Interface Management", RFC 8343, DOI 10.17487/RFC8343, March 2018, . [VAP09] Chandola, V., Banerjee, A., and V. Kumar, "Anomaly detection: A survey", IETF 117, Applied Networking Research Workshop, DOI 10.1145/1541880.1541882, July 2009, . [W3C-RDF-concept-triples] Cyganiak, R., Wood, D., and M. Lanthaler, "W3C RDF concept semantic triples", W3 Consortium, February 2014, . Authors' Addresses Thomas Graf Swisscom Binzring 17 CH-8045 Zurich Switzerland Email: thomas.graf@swisscom.com Wanting Du Swisscom Binzring 17 CH-8045 Zurich Switzerland Email: wanting.du@swisscom.com Pierre Francois INSA-Lyon Lyon France Email: pierre.francois@insa-lyon.fr Graf, et al. Expires 15 October 2025 [Page 18] Internet-Draft Network Anomaly Detection Framework April 2025 Alex Huang Feng INSA-Lyon Lyon France Email: alex.huang-feng@insa-lyon.fr Graf, et al. Expires 15 October 2025 [Page 19]