SDN Resiliency and Security: Modeling and Analytics

Modeling emerging networks offers guidance to network designers in terms of resilience and potentially offers the ability to identify emergent threats before these emerge. Abstracted network models are used in a wide range of domains including biology, cognitive science, physics, and infrastructure to identify characteristics of multilayered networks. Bipartite and tripartite models are potentially most promising (and somewhat underutilized) in the modeling of emergent networks. Specifically, these models can offer insight into the robustness of networks under arbitrary failures. Bipartite and tripartite network models can address the emerging in network structures. Specifically, bipartite and tripartite models are particularly well-suited to a confluence of traditional networks and software defined networks where SDN components are instantiated on shared hardware. There are preliminary results on a simple topology showing the ability to model cascading failures. A straight-froward extension of the bipartite model into a tripartite network representation of a simple software-defined network (showing controllers, switches and data as separate) is shown. This enables first order modeling of cascading failures when there exist virtual topology with interdependency in the physical network (e.g., multiple switches on one physical box).

One of the key components of a network structure is how gracefully it fails under different conditions. Current network architecture implementations have been constructed and evolved to a network that is has been operationally robust against failures and errors, but less resilient against concentrated attacks. Indeed some argue that the networks fault tolerance has resulted in a vulnerability to targeted attacks. However, these results are based on topological models and fail to capture many of the real-world failure and recovery dynamics on these networks.

Static measures of failure are captured in attack and fault tolerance metrics. In these regards, the physical structure of the Internet and the virtual network on top of its structure, the WWW, as currently defined, have similar topological properties. There has been considerable effort in mapping and understanding the topological structure of the Internet and the World-Wide Web (WWW) in order to understand its resiliency in the face of attacks and random errors. The topological properties of the Internet and WWW had made them robust against random faults, but potentially susceptible to targeted attacks.

Moreover, early analysis of resiliency failed to consider the dynamics of failure. One way of measuring the dynamic properties is cascading failure. A cascading failure takes into account additional network failure due to a small initial failure.

Analysis of the effects of cascading failures has been used to examine structure in metabolic networks and electrical systems. Effects of cascading have also been modeled in the study of communication networks such as the AS-Level topology of the Internet. Studies of cascading failure should incorporate both the topological and dynamic properties of the system. If well done, these models suggest possible methods of mitigating risks of cascades. Ideally, modeling cascading studies can make these failures less likely, and recovery less difficult.

Only recently has the study of network resilience, both static and dynamic measures, begun to examine the effects of interdependent networks. Much of the work in cascades in networks of networks has been done in the physics community. power law critique; star networks.

There is also work in communication, power infrastructure, and travel networks. The work in interdependent and multi-mode networks shows that analysis of aggragated unipartite or projected unipartite networks does not correspond to the more refined (but as of yet still being defined) method of multinetwork analysis.

Few of these works, however, incorporate specific attributes of the physical systems being modeled. While ignoring specific qualities can be useful for simplified, abstract statistics and dynamics of a network topology, topological data alone has been found to be inadequate at predicting real-world cascades in power systems. Not only recently but also past models focus on different aspects of network robustness. For example, work on abstract networks tends to focus on node failures, while work on physical systems looks primarily at link failures. Ideally, for robustness analysis of software-defined networks, we would want to incorporate both node and edge failures in our model.

The questions that need to be asked revolve around the resiliency and robustness of software defined networks in different conditions. Additionally, there are issues of control plane injection attacks and crossapplication interactions. In BGP networks, due to the unification of data and control elements, network analysis can be done using well defined techniques. SDNs, by contrast, must be represented in a different manner due the separation of data and control elements.

The separate data and control network structures in a SDN create a network of interdependent networks, which must be modeled as such. In addition there is interdependency with the physical network infrastructure. For example, if one physical machine fails many virtual forwarding elements will fail simultaneously. Data forwarding elements determine where to send data based on information from the control elements. This procedure looks much like a metabolic network where metabolite nodes are connected to reaction nodes. If the metabolite is a reactant, it has an out-degree toward a reaction node, but if it is a product of a reaction, it has an in-degree coming from the reaction, allowing a node to be described by its in and out degrees.

There are some differences, as data forwarding elements can only transmit data to other elements they are physically connected to. Also, unlike metabolic networks, the control elements are in ASN share information about network structure. Furthermore, most connections in a metabolic network are one-way, while in SDNs(NGN?) connections can be defined in numerous ways. Still, the analogy is useful; removing a control element means the forwarding element loses the ability to send data to the correct location. Removing a critical forwarding element means that even when control units are aware of where the data should go, they are unable to produce a connecting element, and the techniques for analyzing bipartite cascading failures should still be viable, with additional modifications.