a year ago

The New Synthetics Alerting Stack Is Live!

As consumers of alerts, you want every alert to be easy to understand, meaningful, and actionable. But when systems bombard you with alerts for metrics you don't need, it's hard to distinguish between the signal of what matters from the noise of what doesn't (cue alert fatigue). Claude Debussy said, “Music is the space between the notes.” That space enables resonation and expression. Music needs a degree of emptiness to be truly appreciated, and the same holds true when it comes to alerting.

We've listened to your feedback and have significantly improved our synthetics alerting capabilities to help reduce the noise and increase alert accuracy. The changes we've made with Synthetics Alerting give our customers the ability to receive more detailed alert notifications, while giving you agency over the individual metrics that are critical to your success – and perhaps more importantly, the ability to silence the metrics that are not.

Flexible Health Options

For a long time, we have set extreme thresholds to disable certain metrics from causing an unhealthy health status in our tests. While this technically works, it does not make for a great user experience. As part of the new Alerting stack, the Synthetics team developed new switches that allow for individual metrics to be enabled or disabled when determining an unhealthy status and ultimately, an alert notification.

Single Metric Alerting & Improved Alert Notifications

Another big improvement to Synthetics Alerting is the change to alert tracking by single metrics. The system now keeps track of triggers for each health-enabled metric and will trigger alerts for them individually. When combined with Flexible Health Options, users can tailor their alerts to only the metrics that fit their use case. This provides more clarity of what is going wrong with a test when receiving an alert.

a year ago

Per Agent Alerts

Based on user feedback, we have added a new option for the way our Network Mesh and Agent-to-Agent tests alert. Previously, alerts were tracked by test – calculating health and alerts for the test as a whole; however, this was not a perfect fit for every use case.

The Synthetics team has now deployed changes allowing alerts to be calculated and tracked on a per agent basis. With this feature enabled, each agent in a test will have its own trigger for the test's health thresholds and one test can have multiple ongoing alerts for different agent-to-agent pairings. Simply select the Agent option in the Status Calculations field to have the test tracked on a per agent basis.

If you have any feedback on this new option, please reach out to your CSM. We are continuing to improve our alerting options and aim to bring even more health and alerting features to Kentik Synthetics as we migrate to Alerts 2.0.

a year ago

Offline Agent Alerting

At Kentik, one of our core goals with synthetic monitoring is to enable your team to quickly and effectively triage any issues that come your way. This effective triage process starts with ensuring alerts are easily delivered to the right team in their preferred notification method.

Previously, alerts associated with agent issues (downtime or recovery) would show and alert in the same notification channel and test results summary in the UI. Based on feedback, we realized this setup was not optimized for customer needs.

We are pleased to share that agents can now be configured to send status alerts to their own notification channel. Simply navigate to the 'Synthetics' > ‘Agent Management’ section of the portal, select an agent, and click Configure to set up status alerting and notification channel(s) for that specific agent. To see agent downtime details, select the agent and click on the ‘Agent Downtime' tab.

With this setup, an email notification is sent for down or recovering agents.

The notification links to the ‘Agent Downtime Details’ tab under the ‘Agent Management’ part of the portal.

a year ago

Kentik Knowledge Base available in Journeys AI

We are excited to announce a new feature in Kentik Journeys AI (Preview) - the new feature enables users to ask "How-to" questions relevant to a Kentik documentation and usage. The primary sources of information for the answer are our Knowledge Base and Kentik Blog posts.

Some example questions that Journeys AI can answer:

how to install kproxy on ubuntu
how to install kbgp in docker
what is peeringdb and how it is used in Kentik
how to configure AWS cloud flow logs export in Kentik portal
what are Cisco IOS XE SD-WAN dimensions and list them in the table
what are available endpoints in Kentik API v5
can you provide details on how to get all sites over API?

You can also ask questions and get answers in your language, for example:

me mostre como instalar o kentik kproxy no ubuntu. por favor explique isso em português

Please share your experience with us and use thumbs up/down to rate the answers you received.

a year ago

Default Metrics in Data Explorer

It's no secret that Data Explorer is one of the most used modules in Kentik Portal. Today we're releasing a convenience feature that lets users decide which metrics are used by default for their Data Explorer queries.
A lot of our heavy Data Explorer users have been asking for this in the past and we're happy to announce that the feature is now generally available. Enjoy!

By default, Data Explorer always uses the same three default metrics when users build queries from scratch:

Each user company have their different golden metrics that they rely on, and speaking with customers, it appeared that some of you ended up constantly needing to change the metrics in Data Explorer as you are building queries from the ground up.
To make it easier, we built the "Default Metrics" functionality in that same "Metrics Selector" UI.

At the bottom of the metrics selector, you will now notice this new Default Metrics panel

This panel will let you do the following

Clicking on "Set as Default" will set the currently selected Metrics as default:
The next new Data Explorer query you will create will have these metrics selected by default. Saving currently selected Metrics to default also includes the settings in the Primary and Secondary Display and Sort Metrics.
Hovering on "Load Defaults" will show you the ones you've set as default:
Before loading these to replace the current set of selected metrics, a tooltip will show you what the metrics are in your default set, as well as which one is the primary, and which one is the secondary. Both will be highlighted with a colored dot, as well as a number showing you which one is primary or secondary.
Clicking on "Load Defaults" loads these queries in replacement of the currently selected ones

Important side note
These defaults are set per user: they may differ between different users within the same company.

a year ago

Label-based RBAC (and more)

Following on to the the release of Role Based Access Control (RBAC) in Kentik Portal we initiated a long journey which will eventually lead to legacy User Levels ("Member", "Admin", "SuperAdmin") to disappear from the product. Throughout the quarters, we are migrating existing portal features under the RBAC umbrella, so that an increasing number of functionality becomes subject to granular permissions.
With this iteration, we are adding two changes:

we have redesigned the RBAC Admin UI both in the RBAC section, and in the User Management Section
we are adding label-based RBAC support for Dashboards and Saved Views

Read on !

Admin interface changes for RBAC Management

We have decided to move away from a Modal based format to a full screen based format, as we felt the information presented for RBAC configuration was too dense to comfortably fit in a small modal. See for yourself in the screenshots below:

On the "Manage Users" screen, beyond the design change of the User Edit modal into a full page, we have added new capabilities such as the ability to identify which role a specific permission on a user comes from:

As well as a way to clone another user's assigned roles into the current user - this allows you to make user permissioning more straightforward when you already know that you a given users needs the same roles assigned as an already existing one (because of the team they are in, for instance):

Label-based RBAC

We also added a novel functionality allowing our users to manage RBAC at scale: Label-based RBAC. As you may have noticed, labels have been extended to a lot of additional areas within Kentik Portal, amongst others to the "Library".

With Label-based RBAC, you can assign permissions to a role that apply to an exclusive set of labels, instead as to all elements of the same kind. This functionality is at current available to Dashboards and Saved Views permissions and allows to restrict the permission to content marked with specific labels - this is displayed in the screenshot below from the edit screen of a role.

Here are a few important gotchas about Dashboards and Saved Views RBAC permissions:

To be able to change labels (add, remove) on a Dashboard (resp. Saved View), the user needs to have the Dashboard (resp. Saved View) Update permissions (this avoids privilege escalations via RBAC).
Labels configured in 'Label Access" type permissions are all OR'ed together, meaning said permission is valid for any view with any of the configured labels on it.
Assigning label-based Dashboards (resp. Saved Views) permissions supersedes the "Sharing" settings of these: if a view is shared but a given user's RBAC permissions for accessing or editing dashboards are restricted to certain labels, they won't be able to view or edit them: i.e. "Label Access" is more restrictive than "Full Access" as the name indicates.
if a user has both a "Full Access" permission to views in one role, and a "Label Access" permission to the same view type, the resulting user permission for that view time is the union of both, i.e. "Full Access"

What's next ?

As we progress on this RBAC journey, your user input as what your most wanted areas of Kentik Portal to be RBAC'ed matters. We are all ears on what you'd like us to offer next on that front !

a year ago

Interface Classification improvements

Some quirks in our Interface Classification settings screen made it more tedious than it should be to create and test rules. For example, a new interface classification rule being created was automatically sent to the bottom of the stack. Therefore, when testing the rule at creation time, it was evaluated before all the already-present rules.
The only way to test the effects of a rule being created was to save it, re-locate it within the stack, open it again, and see if the rule placement change fixed it.

We are adding new capabilities that alleviate this inconvenience, read on !

Interface Classification Refresher

First off, let's go back to how Interface Classification works: upon SNMP polling (frequency varies based on your device settings), the classification engine goes through your updated interfaces and runs them through the Rules stack. To do so, it runs every interface one by one through the stack of rules. Each interface is run consecutively through the stack of rules in the order which they are defined in the Interface Classification screen – starting with Kentik-managed rules first, and then on to each rule from top to bottom.
Each interface exits the evaluation loop of the stack upon first match of the IF clause, and applies the THEN action defined by the rule to classify: Connectivity Type, Network Boundary (inherited from Connectivity Type) and Provider if defined. Once the THEN action is applied, the algorithm moves on to the next interface... and so on.
An interface that goes through the list without match stays categorized as "unclassified".

When creating a rule, users are able to test it from the creation modal to make sure the resulting classification is as expected, allowing the user to see the number of interface matches per device and zero-in on a specific device to inspect the results of the classification action.

The main usability issue came when users were creating a new rule: the rule gets automatically created at the bottom of the stack and hitting the "[Test Rule]" button doesn't offer any choice of locating the rule elsewhere in the stack – the only way to do so was until now to save the rule, go back to the stack, move it, open it and test again in with the new position. This is the part that we just improved.

Adding Rules Mid-Stack

When hovering on any of the rules in the stack, users will now see + signs above and below it. These icons allow the user to create a rule Before or After the one currently hovered. Additionally the Rule creation and test context will honor this desired position.

Upon clicking any of these + icons the rule editor will now be aware which position the rule is being created at in the stack

Users will also be able to select other standard positions in this new drop-down, such as "Move to Top" and "Move to Bottom"

Hitting [Test Rule] will now proceed with testing this new rule given the position it now has in the stack, instead of the default "Bottom" position it used to have.

But wait, there's more

While we're at it, we added a few more niceties:

When clicking the top right "Add Rule" button, the editor modal will also contain the same previously shown drop-down to select where in the stack to create and test the rule
The kebab menu at the right of every rule now contains two new options to directly relocate a rule at the Top or Bottom – we all know how frustrating it can be to drag elements on a web page after or below the fold.

a year ago

Enhanced Flow Ingest with IPFIX IE 315 Support

Kentik is excited to announce an enhancement to our telemetry ingest capabilities with the support for IPFIX Information Element (IE) 315. Let’s dive into how this Information Element is used and why users should care about it.

Understanding the Flow Collection and Export Process

The flow collection process of network devices consist of three stages:

sampling of the packets: selective capturing of the network packets to reduce data volume
aggregation and caching: flow metadata and counters stored in device’s flow cache table
export of the expired flows: sending flow data to a flow collector

During packet sampling, the information from the packet headers and device’s interfaces are extracted to form a unique identifier (key) for each flow in the flow cache table. New packets that match an existing flow key will increment flow’s byte and packet counters, while unmatched packets will trigger a creation of the new flow record.

Flows are exported to the collector based on two conditions:

the inactive-timeout period: no new packets have been detected for a flow, which means that the flow is inactive.
the active-timeout period: current state of flow counters is exported even though the flow is still active.

The Evolution of Flow Collection in Modern Networks

At the time when the flow collection technology was developed, traffic volumes were significantly lower and network devices could “afford” to capture and export data of all traffic flows. However, with today's networks where hundreds of devices are handling traffic at gigabits or terabits per second throughputs, complete flow capture has become impractical. To effectively manage these traffic volumes, the sampling has become a reality in most use cases with typical sampling rates ranging from one in a few hundred to one in several thousand packets.

For near real-time traffic analysis, users would generally set inactive-timeout at 15 seconds and active-timeout at 60 seconds. For DDoS detection use cases, these counters can be configured even with lower values. Given the high sampling rates and rapid flow expiration, the effectiveness of traditional flow caching on network devices is now questionable.

According to the research performed by Juniper Networks, with the flow sampling in the range of 1:1000s and active-timeout of 60 seconds, it is expected that around 90% of the exported flows will have only 1 packet matched. In such environments, the flow caching process on network devices does not bring much benefits. An alternative approach of directly exporting packet headers to a collector seems more effective and this is where IPFIX IE 315 comes into play.

Introducing IPFIX IE 315

IPFIX IE 315, known as dataLinkFrameSection, carries a sample of the network packet headers, starting from the L2 protocol header up to a maximum sample length determined by the network device capabilities. This capability varies by vendor and device models, but typically supports sampling of 64 to 160 bytes.

The typical IPFIX flow data record includes: ingress and egress interface index, flow direction, and the length and data of the sampled frame.

Vendor support

Juniper Networks

Juniper Networks offers support for IPFIX IE 315 through its IMON (Inline Monitoring) feature. It is supported on MPC10E and MPC11E linecards for MX-series devices, MX304, linecards LC480 and LC9600 for MX10K and certain line cards for PTX10K. The feature is implemented in hardware, so there is a minimal delay and no restrictions in the volume of exported data. It supports export of 64 to 126 bytes of packet’s header. More information about the implementation, configuration and support on devices can be found in Juniper’s technical documentation.

The flow data record includes the following IPFIX IEs:

IE Name	ID	Length (bytes)	Description
ingressInterface	10	4	SNMP index of the packet’s ingress interface
egressInterface	14	4	SNMP index of the packet’s egress interface
flowDirection	61	1	Direction of the packet sampling (0 - Ingress, 1 - Egress)
dataLinkFrameSize	312	2	Length of the sampled data link (L2) frame
dataLinkFrameSection	315	variable	Carries N octets from the selected data link frame

Cisco

Cisco supports IPFIX IE 315 on its NCS 5000 and ASR 9000 devices. It uses slightly different flow records with IPFIX IE 410 (sectionExportedOctets) to store the length of the sampled frame section and does not include flow direction field. The size of exported frame is up to and including L4 header, to the maximum of 160 bytes that can be exported.

The flow data record includes the following IPFIX IEs:

IE Name	ID	Length (bytes)	Description
ingressInterface	10	4	SNMP index of the packet’s ingress interface
egressInterface	14	4	SNMP index of the packet’s egress interface
sectionExportedOctets	410	2	The observed length of the packet section
dataLinkFrameSection	315	variable	Carries N octets from the selected data link frame

Benefits of using IPFIX IE 315

The approach of IPFIX IE 315 shifts the packet decoding and metadata extraction work from a network device to a flow collector. This reduces the processing requirements on network devices and eliminates the need for maintaining a flow cache table, leading to lower CPU and memory usage and potentially simpler hardware designs. Moreover, the immediate export of packet samples enhances the detection time of DDoS attacks.

Support in Kentik

Kentik Flow ingest on the SaaS supports use of the IPFIX IE 315, with the Kentik’s default Device Type “NetFlow-enabled device”. The feature is also available on Kentik kproxy version 7.43.0 and it supports both Juniper and Cisco implementations.

a year ago

Evaluate traffic on Internet Exchanges (IX) and Data Centers (PNI) with PeeringDB

Some features are hidden jewels that are hard to spot: they don't come with a shiny UI that make them stand out, but they pack some heavy punch and you wouldn't know unless someone reveals their true power to you.
This is one of those features. More often than not, assessing the benefits (aka the amount of traffic that can be peeled off to the IX) in joining a certain exchange requires so much discouraging work that these decisions end up being made on a gut feel, almost as an act of faith.

Enter the Data Explorer PeeringDB filter dimensions!

The theory

Let's say we want to evaluate how much of our actual traffic could be peered off at a specific Internet Exchange. What do we do as a Network Engineer without access to Kentik ?

Collect source ASN or destination AS flow data inbound and outbound, then dump it in a spreadsheet
For each ASN in that breakdown, list all Internet Exchanges that they are present either via leveraging the PeeringDB public API, or manually by reviewing all ASNs' PeeringDB record – some ASNs are at tens if not hundreds of exchanges...At the same time, note what their peering policy is.
Run that ASN list through a sieve to only keep those at the Internet Exchange you are interested in, and that have a peering policy you can comply with.
Sum all the resulting traffic.

Having done this routinely in my career, I can safely say that:

Either you have coding skills and you can code your way out of it... or you don't and this will take multiple hours. This outlines the issue of inefficient peering decisions: analytical decisions are gated by an inordinate difficulty in collecting and correlating simple data sets together, and the common result is that hunches and arcane culture end up replacing the science.

So how does it work ?

Here's how we wanted this to work: add filters to Data Explorer, so that users can make these types of queries – let's think SQL for a second:

Show me my top Destination ASNs where said ASNs are registered in this specific IX, with an open policy

When the query is run, Kentik's Data Engine issues a two steps query:

Pulls the list of top destination ASNs,
Pulls the list of ASNs at this specific IX and filter it by the list returned in the first step.

Remember, Kentik is mirroring the PeeringDB database daily, so this information is always fresh.

and Voila !

Cool story bro, now just show me a real life example kthx.

Let's try this with a concrete example: Kentik is present at Linx NoVA. I want to know how much inbound traffic we can peel off from our Transit upstreams at this exchange. We've got a few peering sessions there, but I have a hunch I can reclaim quite a bit of bandwidth through this exercise – let's see if I'm right.

I'm going to look at these dimensions, no surprise here

But when building my filter, I'm going to use this new PeeringDB section here, where I can now set whatever IX I want my source top ASNs to be part of:

Notice how you can leverage this same dimension with a "SOURCE or DESTINATION ASN is member of this IX"

You'll notice that this filter item sources its values directly from PeeringDB, so I'll enter "LINX NoVA" for "Source ASN is Member of IX" – note that I'm filtering on Source Network Boundary = External because I just want to see the traffic coming into my network.

My intuition gets confirmed: if I look at the resulting SanKey diagram, it appears that only a small part of traffic towards LINX NoVA member ASNs I receive traffic from is actually coming through my IX port at LINX NoVA, see for yourself.
According to this query, I could peel ~384Mbps from inbound Transit traffic to this exchange by establishing the right sessions with these ASNs on the right side of diagram I'm currently receiving traffic from.

Now If I really want to get a realistic estimate, I need to ensure that I'm only considering traffic from ASNs with an Open Peering Policy - no problem: PeeringDB also has that information, so let's just add it to the filter

And so the Transit part of the previous SanKey gets revised with a bit less traffic because the query engine is now discarding source ASNs without an Open Policy:

As always, there's more!

There are more use cases that this well-hidden gem empowers our users to do:

Reclaim Transit traffic and convert it to IX peering traffic, and more importantly maximize the use of your IX ports
Identify PNIs to secure with IX peering as a fallback, or additional peering locations you can meet your existing peers at for more resilience
More than anything: evaluate which IX to deploy to next based on how much assured traffic you can peer off and estimate the port capacity you will need.

All of these are focused around IXes, but the same applies to this other filter dimension set: " ASN member is available at " – which helps you answer the same use case, but from a data center standpoint for PNIs.

This is a pivotal step towards an even better outcome: what if every day, Kentik ran a report on all of your top Source or Destination ASNs and made informed, analytical suggestions about which of them you should go next?

Watch this space, because this is where we are headed.

a year ago

Interface Classification: PeeringDB integration

If you remember back in May 2023, we released our initial version of a PeeringDB integration - more details here: https://new.kentik.com/we-now-integrate-peeringdb-data-akabK

We are now taking it to the next level by leveraging PeeringDB's dataset into our interface classification engine. The idea in this integration is to auto-detect which of your interfaces are connected to a well known internet exchange and automatically classify it without requiring you to create a rule for it.

The theory

PeeringDB contains a directory of most of the registered Internet Exchanges (IXes), as show in an extract below.
For each one of them, their data contains the IP Ranges, v4 and v6.

For each interface that we poll on your devices using SNMP or Streaming Telemetry, we get information such as the IP Addresses configured. Using the aforementioned IX IP Range data, we are then theoretically able to match any interface to an Internet Exchange from Peering DB based on its IP Address.

What does it look like in the product ?

We've added a somewhat magic rule, which we have decided to call a Kentik-managed Rule in our Interface Classification UI. This rule is a bit special compared to the other ones in the stack:

It cannot be edited by users
It can only be enabled/disabled by users
It sits on top of the rules stack, which means that it will always be evaluated first to bypass any user defined rule (because for each interface, the algorithm exits on first match)
This rule checks the IP Addresses, and looks the IP address up in all the IP Ranges from all the IXes in PeeringDB, if a match is found
- it sets the interface to Connectivity Type = IX
- it sets the interface to Network Boundary = External
- it sets the Interface Provider to the name of the IX from PeeringDB
- it establishes the mapping between this IX and the PeeringDB Exchange in the PeeringDB integration configuration so you don't have to

Here's an example of what Interface Classification Test UI for this rule will look like in case of matches

Existing Customers vs New Customers

Obviously a lot of you may already have their interfaces well classified, and most likely you used dynamic regex matching on their descriptions to get your IX classifications going.
As we didn't want to disrupt your existing setup, we have adopted what we think are reasonable defaults to roll the feature out:

Existing customers will have the PeeringDB rule disabled by default
-> up to them to enable it if they would like
New customers will always have the PeeringDB rule enabled by default

But wait, there's more...

As an additional freebie, we've modified the Interface Provider field for any custom rule that sets an interface as IX: you can now choose to keep your own provider naming, or can select the actual PeeringDB name for the internet exchange - in this case the field will offer a typeahead that directly looks up values from the PeeringDB exchange names, as showcased below.

There are still a lot of features that we plan on implementing relying on the PeeringDB dataset, so watch this space, and don't hesitate to give us feedback !