Visual Analytics for IP Transit & Peering Connections

17 May

Overview

Depending upon the size of the organization, network operations (NetOps) teams often include someone specifically tasked with – capacity planning, traffic engineering, and/or peering management. Regardless of their exact title or role definition, these folks tend to be concerned with their networks’ IP transit and/or peering connections and tasked with managing these connections to improve performance, reduce costs or both.
In this post, we explore the programmability of NetSpyGlass to demonstrate how a few lines of Python code can enable granular, real-time and historical visibility into IP transit and peering connections in order to more effectively manage them.

The Context

Network operations teams acquire an intimate understanding of network traffic patterns and trends over time. Acquiring this understanding is an essential part of their job because it enables them to proactively rebalance traffic across the network to improve performance and/or to reduce costs. One key facilitator of this understanding is access to historical network traffic data to answer questions like:

How did the traffic volume to this ISP or class of ISPs look last year?
How does it compare to the traffic volume we have this year?
What patterns or trends in this traffic can we expect in the coming years?

Another key facilitator of this understanding is the ability to group, classify or tag traffic according to business-specific criteria. This enables the team to address questions like:

What is the total amount of traffic we send to our peers with whom we have paid peering contracts?
How does it compare to the amount of traffic we send to our peers with whom we have free peering?
How does it compare to the traffic volumes moving across our paid IP transit connections?

Hardly the result of a one-off or academic curiosity, these kinds of questions are ever-present. They are the bread-and-butter concern of network capacity planners, traffic engineers, and peering managers who can save their organizations a lot of money with timely and accurate answers to these kinds of questions. So, what’s the problem?

The Problem

The problem is that tooling available to monitor and manage network devices tend to provide access to raw parametric data but typically doesn’t provide the calculated/aggregated metrics required to answer these kinds of questions. Nor do these tools provide a way to process the raw data into meaningful analytics useful for addressing the above questions. This would require; 1) exhaustive discovery, mapping and monitoring of the network (i.e. all devices, interfaces, connections, etc.) combined with 2) the ability to construct a data model of the network (using metadata) to facilitate analytical/computational rigor and 3) a means of accessing and manipulating the data model and finally, 4) an embedded analytical/computational capability. NetSpyGlass conveniently brings all these capabilities together to help network operators answer the above kinds of questions with ease. The following sections discuss each of these building blocks for detailed visibility into IP transit and peering connections.

Auto-Discovery & Mapping

Everything begins with NetSpyGlass conducting SNMP v1,2,3 device polling to automatically discover vendor, model, protocols, component inventory, configuration, interfaces and then mapping Layer2 network topology using the information collected from devices. Network topology is presented to the user in the form of interactive network maps. Users can build custom map views to show a subset of the discovered devices using various matching criteria.

“Tags” As Meta-Data

Next, a NetSpyGlass feature called “Tagging” enables meta-data to be attached to anything monitored in the network. Specific to the above questions, however, tagging would attach meta-data to monitoring variables describing traffic through interfaces. For example, this meta-data could indicate for a particular interface “this is a transit interface” or “this is a free peering interface”, etc. Users have the flexibility to apply any new or existing classification when defining a tagging schema, it’s entirely within their control. For their own operational purposes, users may already have defined interface descriptors that are well-suited for double-duty as tags in the NetSpyGlass system. Once tags are defined, users need only a few lines of Python script to set these tags throughout the network. And, once interfaces are tagged, corresponding monitoring variables will also get tagged (automatically). Users can also build dashboards and alerts that retrieve variables by tags.

NetSpyGlass Query Language (NsgQL)

NetSpyGlass also features a proprietary query language loosely based on SQL syntax (i.e. NsgQL) that can be used to select monitoring variables, devices and components. NoSQL is used to build queries that access monitoring data, devices and components in NetSpyGlass including by tag. Relative to the above questions about IP transit and peering connections (for example), the combination of tagging and NsgQL enable users to ﬁnd interfaces using a query such as “select variable for outbound traffic where tag indicates the interface carries a BGP session with Level 3 and it is a transit circuit”. Or, another example NsgQL query could be to “select variable for outbound traffic where tag indicates the interface connects to EQUINIX and is a free peering connection”.

Embedded Computational Capability

NetSpyGlass has an embedded Python interpreter that gives it tremendous computational/analytical capabilities. The embedded Python interpreter is accompanied by a collection of NetSpyGlass proprietary Python modules that implement a broad set of analytical functions such as performing calculations with monitoring data (in Python) and generating new metrics. And, when operating on lists of monitoring variables, Python scripts are used to modify or create new variables, add or modify tags on monitoring variables, trigger alerts and much more.

Putting It All Together With “Aggregates”

The network is now fully auto-discovered, mapped and tagged. And, NetSpyGlass users have at their disposal fit-for-purpose query and computational capabilities with which to address the questions we started with. Using a few lines of Python script users can now compute new variables (we call these Aggregates) with values equal to the sum of variable values found using an NsgQL query as in the examples above.

For example, users can “compute the sum of values of variables for outbound traffic through interfaces tagged to indicate it is free peering”. Then, users can give this new variable (i.e. Aggregate) a name and have it persist within NetSpyGlass with real-time calculated values. Once this is done, users can build a graph with the value of this Aggregate variable for the past year and see the trend of free peering, compare it to the trend of paid peering (or transit) and in the end, see the big picture and make decisions that facilitate connection cost optimization.

An important differentiating feature of NetSpyGlass is that these newly created variables can persist in the NetSpyGlass system like all other monitoring variables and are not mere graphical artifacts. Historical values for these Aggregate variables are stored in the NetSpyGlass time-series database (TSDB) and are available for creating alerts, graphs, dashboards, reports or even for creating new, more complex Aggregates. The following discussion generalizes the concept of Aggregate variables and then illustrates the computation of Aggregate variables for transit and peering interfaces.

How Aggregates are Calculated: Python Code Walk-Through

The Python code below illustrates how Aggregates are calculated. For a more in-depth understanding of how the NetSpyGlass Python interpreter accesses monitoring variables, refer to the documentation – here.

Beginning with code statements at lines 3-4, we import proprietary NetSpyGlass Python modules required for calculating (3) Aggregate variables:

interface traffic in (ifInRate)
interface traffic out (ifOutRate)
interface speed (ifSpeed)

for (2) kinds of interfaces:

“transit”
“private peering”

Network interfaces are identified by the tag ifDescription with values PRIV (private peering) and TS (transit). These tags are the result of a simple Python hook script that parses discovered interface descriptions to extract keywords that are then converted into tags on each interface.

For each set of interfaces that match one of the two types (PRIV or TS), there are (3) new aggregate variables that are computed. There can be many interfaces with tag ifDescription.PRIV or ifDescription.TS across the network. The most recent values of the corresponding Aggregate variable (ifInRate, ifOutRate, ifSpeed) are added together and stored in the new instance of a variable with the same name but with a new “device” association (i.e. Transit or Private). These “devices” associated with Aggregate variables do not exist as actual, physical devices in the network. Therefore we refer to them as “pseudo-devices” because they are in fact synthetic, an abstraction.

Let’s digress just a bit to understand why “pseudo devices” are created. Most fundamental to know is that all NetSpyGlass variables (including Aggregates) are identified by a 3-part handle consisting of:

variable name
device
component

Thus, all NetSpyGlass variables refer to a device/component pair. An example of this for a regular (non-Aggregate) variable would be:

variable name = IfInRate (Interface In Rate)
device = Router
component = Interface

Here we see that the device/component pair to which the variable refers would physically exist in the network.

The same pattern applies to Aggregate variables with the exception that the device/component pair to which the variable refers does not physically exist in the network:

variable name = (IfInRate or IfOutRate or IfSpeed)
device = (Transit or Private)
component = Aggregate

There are no physical “Transit” or “Private” devices in the network. So we instantiate a synthetic (or pseudo) device/component pair to satisfy requirements applicable to all variables in NetSpyGlass, i.e. there has to be a device/component pair to which a variable refers.

The processing described here is performed by function agg_var() (code block lines 10-22). This function processes tag information to instantiate the pseudo device/component pair. We create pseudo-device “Transit” or “Private” depending upon what is used as the last argument in the call to function self.agg_var() (code statements lines 32-34 and 38-40) which is called from execute(). And, we then create pseudo-component “Aggregate” (code statement line 18, second argument in the call to new_var() ) to satisfy our requirement for all variables and to provide users with a single, unified data model for all variables in the NetSpyGlass system.

The pseudo device/component names created here also appear in NetSpyGlass’s Graphic Workbench, in Grafana and elsewhere in the NetSpyGlass system. For example, the screenshot (below) produced by Graphic Workbench illustrates how pseudo-device names enable easy and intuitive presentation of real-time and historical calculated values describing interface performance (in this particular instance) but this capability can be applied with equal ease to any monitored entity in the network.

Also important to note here is that NetSpyGlass enables users to even apply tags (in Python script) to Aggregate variables. For example, code statement line 21 aggr.addTag(`Explicit.Aggregate`) applies the Explicit.Aggregate tag to Aggregate variables that enable users to easily find and use these variables to create graphs (as is done in the Graphic Workbench screenshot below), or to create alerts, reports, dashboards and much more.

So we have (3) Aggregate variables (ifInRate, ifOutRate and ifSpeed) and (2) interface types. We are calculating a total of (6) Aggregate variables. In the script above, the (6) code statements at lines 32-34 and 38-40 perform these Aggregate calculations for transit and private peering interfaces respectively.

NetSpyGlass requires a class in this script, and the class must derive from nw2rules.Nw2Rules and it must have function execute() (code block lines 24-40).

NetSpyGlass servers run function execute() on every monitoring cycle after all data has been collected. Code statement line 25 annotates the system log to indicate that Aggregate variables are being computed.

Aggregates Use Case: Visual Analytics for IP Transit & Peering Connections

Now, let’s explore how to access and use what we have created with the Python script above. We do this with a discussion of the graphic (below), a screen capture taken from NetSpyGlass’s Graphic Workbench feature which the user accesses directly from the user interface by selecting “Graphs” from the vertical menu bar on the left side of the display.

In our current use case, “iflnRate” is a variable representing the rate of ingress traffic on the interface. The user could also have filtered the results by device name, component name, component description or even by tags. In this example, the user has selected to graph the values of the variable (iflnRate) for transit and peering interfaces that match the tag – Explicit.Aggregrate over a seven-day period. Recall in the preceding discussion of the Python script, that we used a single line of code at line 21 aggr.addTag(`Explicit.Aggregate`) to tag newly created Aggregate variables. And, here we can see that the tagging facilitates filtering (finding and selecting) only the variables of interest to show in Graphic Workbench.

The table shown at the bottom of the graphic shows Current, Min, Max values together with Units for the Aggregate. At a high level, this discussion illustrates the ease with which network operators can use NetSpyGlass to formulate and visualize analytics describing traffic patterns across their IP transit and peering connections.

Visual-Analytics-Blog-Fig2

Object-Oriented Design Meets Network Monitoring Automation

Notice in the graphic above, table column headers labeled “Device” and “Component”. This follows directly from the preceding discussion of the data model for all variables in NetSpyGlass. Here, in the above graph, the variable name is (IfInRate). The Device column lists “Transit” and “Peering” pseudo-devices that have also been tagged “Explicit.Aggregate” during instantiation (refer Python Code Walk-Through above). The Component column confirms that for each pseudo-device we also have a pseudo-component with designation “Aggregate” as is our convention when creating Aggregate variables. Finally, the numerical values shown on the right side of the table represent the rate of ingress traffic on “pseudo-interfaces” of “Transit” and “Peering” pseudo-devices.

NetSpyGlass uniquely provides users with an object-oriented abstraction layer that enables them to instantiate objects in the network data model to represent most any item of interest that is otherwise not directly accessible, observable or measurable. In this post, we have seen how “pseudo-devices” are instantiated as objects representing “Transit” and “Peering” items of interest by virtue of the tagging schema. And, we have seen how “pseudo-components” are instantiated as “Aggregates” to reflect the fact that the associated variable (IfInRate) contains calculated values representing a specific property of the corresponding object (in this case, the Aggregate represents the ingress line rate ‘property’ of the Transit/Peering ‘object’).

In a nutshell, this illustrates how NetSpyGlass’s programmability via it’s embedded Python interpreter enables users to apply object-oriented design principles to network monitoring automation. The combination of tags and Python scripts, enables users to extend the network data model to encompass any aspect of the network of particular interest. This is accomplished by creating objects to represent those aspects (of interest) and defining Aggregates to represent properties of those objects as calculated values of unlimited computational complexity.

Summary

Programmability conquers complexity and provides a powerful foundation for network monitoring automation. In this post, we explored a compelling use case for some of the unique features of NetSpyGlass that easily give users visibility into IP transit and peering connections in order to more effectively manage those connections, i.e. improve network performance and/or reduce costs.

These features include; auto-discovery and mapping of network topology; tagging to attach meta-data to anything monitored in the network; SQL-like query capabilities (NsgQL) to select monitoring variables, devices and components, including selection by tags; and an embedded Python interpreter with proprietary modules providing programmable computational and analytical capabilities.

We introduced the concept of “Aggregates” as variables containing calculated values of unlimited computational complexity that typically represent some aspect of the network that is of interest but that is not directly observable, such as the rate of ingress traffic on a particular interface or category of interfaces. We showed how Aggregates give NetSpyGlass users visual analytics for their IP transit and peering connections. And finally, we took a walk through an example Python script to illustrate how Aggregates are calculated.

NetSpyGlass features discussed in this post enable innumerable use cases in network monitoring automation. Although the focus here has been transit and peering connections, Aggregates have virtually unlimited practical utility by following the same pattern outlined in this post. For example, users could easily monitor power consumption by calculating total current draw for a rack. Just tag devices that represent PDU units with the rack number and then compute Aggregates as we have done in this post. Here, however, we would match the rack number tag to select input variables instead of ifDescription like we do above. Devices that represent PDUs could be easily tagged with rack numbers especially if their names include this information as part of the user’s naming convention.

In upcoming posts, we will explore additional use cases with particular emphasis on NetSpyGlass’s programmability as the enabler of easy, effective and powerful network monitoring automation. If you face challenges with your current approach to monitoring your network, share your thoughts below and maybe one of our upcoming posts can provide some helpful guidance.