Bringing NMS and Flow Telemetry together, one release at a time.
Today, we're sharing the first step in a journey to seamlessly integrate Kentik NMS with our Flow platform. This is just the beginning of a series of iterations that will bring them together in a more cohesive and powerful way.
Read on as we show you a new and easy way to visually correlate NMS charts with Traffic data.
Metrics Explorer vs. Data Explorer
A novel take on an existing type of product, Kentik NMS' approach relies on taking advantage of what made our Flow Telemetry platform a hit: open exploration using Metrics Explorer, the little brother of our award-winning approach you know and love in Data Explorer. In other words, while Data Explorer is the business intelligence (BI) platform to your Network Traffic data, Metrics Explorer is the BI platform to your SNMP or Streaming Telemetry data.
When we launched Kentik NMS, our goal was to marry an NMS with world-class Traffic Analysis to provide our customers with the most cutting-edge and useful network observability platform available. To that end, we’ve learned a lot about how our initial users were using it and took some notes:
- Not everyone who's gained NMS Metrics Explorer expertise is comfortable with Data Explorer, especially given the latter is beyond feature-rich because of years of successive improvements
- A lot of troubleshooting workflows follow the same pattern: identify a peak or a trough on a chart, then inspect traffic to investigate what factors might be contributing to this pattern – rinse, repeat... – very often an iterative process
Correlation, Causation, AI, and the Network Engineer
Recent days have marked the rise of ML/AI where every product (and Kentik is no exception) will show you machine learned insights about something you did not know about your network.
Additionally, we get reminded more often than not that correlation does not equal causation, as absurdly illustrated in the meme below:
Yet, years of practitioner experience in this industry tell us that a vast majority of network troubleshooting activities always end up in trying to identify a bump or a trough on a chart by looking at other charts to identify a probable root cause.
In this process, the network engineer is always better equipped when they can leverage a UI/UX that makes it easy for them to quickly eyeball multiple charts on top of each other, with a perfectly aligned time range.
So, we took a few simple use cases and iterated to provide a UI that helps the network engineer correlate SNMP and Traffic charts together:
- "A port on a device is running hot, what could be the reason?"
- "What could be the reason behind the CPU of this router peaking?"
Often times, what we noticed was that the right tool was more about allowing users to quickly iterate through hypotheses, going from one finding to the next, quickly ruling out dead ends. With this as the target user methodology, we came up with the small but powerful capability described in the next section.
Introducing the Metrics Explorer bottom drawer
In Metrics Explorer, you may now notice a little kebab menu at the end of each row. If your query yields a Site, Device, or Interface, you will now be offered with a contextual menu which allows you to summon a traffic breakdown for that specific row:
Selecting any of these entries will summon the "Dimensions Selector", allowing the user to choose any set of up to 8 traffic dimensions to break traffic down for this Site, Device, or Interface – here's an example selecting Source IP and Destination IP for this device. As you can see:
- a bottom drawer opens up with a nested Data Explorer traffic query that's perfectly lined up with the Metrics Explorer one to facilitate visual correlation
- this drawer can either be minimized, discarded, or a new tab can be opened with this very query pre-populated by clicking "Open in Data Explorer"
- discarding the bottom drawer to replace with a new set of traffic query dimensions is also pretty straightforward, allowing for fast-paced troubleshooting iteration
Tell us what you think! What's next?
This feature tested pretty well with our field teams, but we're curious what you think of it! Let us know how we can make it better in future iterations.
We've already started thinking of other areas where we want to bring this traffic inspection bottom drawer:
- Add it to the Capacity Planning workflow so that users could directly look into the reason for why an interface is facing imminent congestion
- Bringing it into the NMS Device screen: it has an "Interfaces" tab, which would definitely benefit from the ability to inspect traffic breakdown for any interfaces here
- ... and then what about the reverse? What about being able to see the CPU traffic chart for a Traffic Breakdown that has the Device name in it?
Stay tuned for near future announcements around our plans to bridge our NMS and Flow Analytics worlds together!