a month ago

Syslog additions are here!

Our users and readers probably noticed a few months back that we added the ingestion of SNMP Syslogs and Traps to our platform via NMS.
Today we are bringing new enhancements to that Syslog functionality. But before we dig into those, let's do a quick recap of what was already available.

Setting up Syslog on a Network Device

First order of business when you want to add Syslog observability to your Kentik setup is to deploy our Universal Agent and enable the syslog capability on it. Deploying Universal Agent is trivial and all that's needed is to enable the Syslog Server capability during the final steps of the deployment process, as pictured below:

For an already deployed agent, the capability can be installed/enabled on the fly as depicted below from the Settings > Universal Agent agent list.

Once this is done, all you need to do is configure your network devices to send Syslog records to this agent. If necessary, you can also configure the listening IP and ports on the syslog capability.
Provided your device is NMS enabled, the Kentik cluster will ingest Syslog records from the device via the Universal Agent's syslog capability.

How can I access Syslog data for analysis? (part 1)

All ingested syslog data is enriched with a sum of attributes and immediately stored in our universal telemetry datastore, making it ready for analysis.
Syslogs (as well as SNMP Traps) correspond to a new broad type of Telemetry available in our platform called Events. We intend to add additional event types in the near future, so stay tuned!

Once ingested, new Metrics and Dimensions choices (both Group By and Filter) will be available in Data Explorer, as depicted below - allowing for very granular event type queries.

Here's an example of a Data Explorer query displaying syslog volumes per device and per severity over time.

Additionally, the new Events View tab on the data table will appear for users to display a complete list (unaggregated) of all the Syslog messages captured in this time window for filter defined in the query.

Of course, all of these visualizations are available to a bunch of extra useful capabilities such as

Saved Views
Dashboards
Filter Based Dimensions
Generate One Chart per Series

So what's new with Syslog? (part 2)

As of today, we're adding a few niceties, this is where it all becomes interesting. The keen eye will have noticed that the Infrastructure > Devices inventory screen includes a new filter in the NMS Status section - this filter now allows users to narrow the device list down to those that we are seeing Syslog entries for.

...but the real deal is that starting today a new Syslogs tab appears on the Infrastructure > Device > device details screen: let's say you are doing your rounds on a device, eyeballing it's metrics in the overview tab - you see a spike on CPU on the overview spark line and want to check if error reported by syslog has happened that may explain it, here's what it looks like when you click on the Syslog tab. Now users can easily search for specific syslog entries on this device of any Severity within any Time Range.

What does the future hold?

As mentioned earlier, we are just getting started on Events. They will soon play an important role in our unified observability stack, being additionally available to our recently released AI Advisor: when clicking on the top right Ask button on a Device Details page to summon the agent, it will already have the recently Syslogs for the device as part of its context.

As a fast follow-up, we will be soon adding SNMP Traps as a similar additional tab in the Device Details screen.

Later this year, we will be opening up amazing possibilities to query together Traffic (Flow), NMS (SNMP/ST), Events (syslogs, traps & more) and Performance (Synthetics) together in a cohesive set of visualizations - so stay tuned as things are just about to become interesting!

a month ago

Low-Resolution login screen, Kentik Next available to SSO users

Low Resolution login screen

More often than not, our users may have to access Kentik Portal from a web browser running via Terminal Server or some equivalent VNC variant.
Some of our users have reported that the animated gradient on the new Login Screen resulted in a very laggy experience in such remote setups.

To solve this issue, we have introduced a QueryArg in the login URL that these users can now bookmark, which instead of an animated gradient displays a static one, therefore solving this issue:

https://portal.kentik.com/login?lowres=true if you are a US Platform user, or https://portal.kentik.eu/login?lowres=true if you are an EU Platform user.

Kentik Next for SSO users

https://next.kentik.com (alternatively https://next.kentik.eu to our EU users) is a public version of Kentik Portal which is always ahead of our production systems - it is the equivalent of OSes' Nightly Builds, a public pendant to our portal.kentik.com main instance. "Next" as we call it, exists for our customers to experience the next features to hit our production systems. (Disclaimer: as paint may constantly be wet on it, we advise users to stick to the product systems to perform production work).

Up until now Next was not available to our SSO users - but it is now, so give it a try !

2 months ago

WebAuthN Authentication in Kentik Portal is here!

In our everlasting quest to strengthen security around the Kentik Platform, we're happy to introduce WebAuthN today – a growing web browser native Web Authentication standard with many benefits over prior ones.
Until today, we offered Multi-Factor Authentication (MFA) to our users via these 2-Factor (2FA) methods: Time-based One-Time Password (TOTP, also known as Authenticator App-based Tokens) and hardware keys such as YubiKeys from the FIDO Alliance.
While these methods offer a better security level than plain user/password authentication and we strongly encourage our users to adopt 2FA, the standards have evolved to new, more secure methods that we are now proud to offer to our user base.

Let's see what this is all about!

Authentication security concepts

Let’s take a look at modern improvements recently achieved in the domain of Web Authentication.

"Something you are"

In authentication, there are three categories of credentials (or factors) used to verify a user's identity. They are: something you know (like a password), something you have (such as a security token), and something you are (like a fingerprint). Using a combination of two or more of these factors is known as multi-factor authentication (MFA).

Modern Authentication favors something you are with the use of Biometric Methods: Fingerprint Recognition (known as Touch ID for Apple users, or Hello for Microsoft users), or camera-based Face Recognition (known as Face ID for Apple users, Face Unlock for Android users). While a malevolent actor can phish something you know, steal something you own, it is much harder to spoof something you are when it is based on your unique biometric markers.

Public/Private keys

Another recent security improvement on the web is the adoption of browser-based Public-key credentials extensions (WebAuthN, which we’ll talk about in a minute, uses this scheme).

In a public-key based Authentication model, a pair of keys (public key and private key) are used in authentication. The remote authenticating system stores a user's public key (visible to anyone) and a credential ID, not a password. The private key, which is the secret half of the key pair, is stored securely on the user's device, not on the server.

This design offers significant security benefits compared to traditional passwords:

Security by design: The server has no shared secret with the user that could be compromised. The public key is useless to an attacker on its own.
Phishing resistance: The private key is cryptographically bound to a specific website domain, so it cannot be used on a fake phishing site to trick the user.
Data breach protection: If a server's database is breached, the attacker can only steal public keys and credential IDs, which cannot be used to impersonate a user.

What is WebAuthN ?

WebAuthN is the latest version of the FIDO Alliance’s open authentication standard (FIDO2). It is an effort to bring strong 2FA to the web and is based on the W3C’s Web Authentication API, which is supported by many, if not most, common web browsers.
In a nutshell, WebAuthN brings these attractive improvements to prior 2FA technologies:

it is the leading open authentication standard on the web: it is widely adopted, can be audited, and comes natively in most recent browsers
it adds public-key cryptography to most existing 2FA methods, securing them further (with the exception of TOTP, which becomes the least secure 2FA method)
because most recent browsers are tightly integrated with the hardware and OS they run on, it brings Biometrics (aka "something you are") to web authentication, alleviating the need to procure physical keys

What does it look like in Kentik Portal ?

To enable WebAuthN we've made changes to the User Profile section's Authentication tab, surfacing these new 2-Factor capabilities now offered to users -

but before we dive into these changes, let's summarize the levels of security now offered by Kentik Portal per Authentication method and outline their respective security levels:

Multiple 2-Factor Methods per user

Kentik still offers each user to configure multiple 2-Factor Authentication methods in their User Profile – this allows users to configure backups or configure alternatives between when they're at home and on the go. A user can configure and name as many of these as they desire.

These Authentication methods are now split in 3 separate tables (click the button on the top right of each table to add one):

(1) Legacy Methods:
Least secure 2-Factor - will include your Legacy Hardware Keys such as YubiKeys, and your TOTP.
You can re-create a new entry for your YubiKey in the Security Keys table, which will make them WebAuthN compliant (more secure): we strongly encourage you to do so !
Because Time-Based One-Time Passwords aren't compatible with the WebAuthN standard, they will stay in this "Legacy Methods" section, we advise to move away from them.
That being said, they're still a better alternative than no 2-Factor.
(2) Device Authenticators:
These are Hardware/OS level biometrics such as Apple's Touch ID and Microsoft Hello - they are considered to currently be the most secure methods, because they correspond to the "Something you are" principle.
Registration of these via the Enroll Device button is natively supported by most recent browsers using a common UI.
(3) Security Keys:
These authentication factors include both Hardware USB Security Keys (such as Yubikey, or Google's Titan- both FIDO and FIDO2), both natively WebAuthN compliant -with FIDO2, they come with a PIN code.
In addition to these keys, you can also configure a mobile based (both iPhone or Android) WebAuthN compliant methods in this section. In this clever method, a QR code is presented to the user at login time, triggering the device's biometric native UI to proceed with a Face ID / Face Unlock verification.

When multiple methods are available, authentication will always prioritize the Device Authenticators first, via a native browser prompt. If other WebAuthN methods have been configured by the user such as a YubiKey or an iPhone/Android Mobile Authentication - these will be available as part of the same prompt by choosing Other Methods. (see screengrab below)

Biometrics authentication is always prioritized in the Native browser integration

As a user, what should I do ?

This choice depends a lot on the Security policy dictated by your company, which you should always conform to.
With that said, as outlined by the previous diagram in this article comparing the security level of the various available methods, Kentik highly suggests that you always opt for the most secure one possible, which is encompassed in the following recommendations:

Always use 2-Factor – plain password authentication is unsafe.
If your current 2-Factor is TOTP, you should consider adding a WebAuthN compatible one now – in this case HW based biometrics are your best choice since they’re available on any recent laptop or mobile device.
If your current 2-Factor is a YubiKey, you should consider
- re-registering it in the Security Keys section to add WebAuthN to it
- adding a biometrics-based Device Authenticator if your computer allows it, it will be prioritized over the YubiKey
Try to have at least two methods configured, in case you lose one of them or if it happens to get compromised – so that you won't lose access to Kentik portal.

As someone who is responsible for Kentik App Security, what should I do ?

As a security focused Kentik Administrator, you want to increase Authentication Security for all your SaaS Applications, Kentik being no exception. To make your job easier of migrating users from a weaker 2FA to a stronger, WebAuthN 2FA, we added a filter in the Company Settings > Users screen to identify users based on their 2FA settings:

...where Strong points to WebAuthN 2FA methods and Weak (Legacy) all the other, least preferred ones. Note that the Only Weak option will let you identify those of your users that haven't yet migrated to a stronger, WebAuthN based 2FA method.

Additionally, a new Custom button appeared at the top right of the Users table, which will let you add two new columns in - to help Kentik admins track 2 Factor adoption within their company:

Strong Authenticators: number of WebAuthN 2Factor Authenticators configured
Weak Authenticators: number of non-WebAuthN 2Factor Authenticators configured

What's next for Kentik Portal Authentication ?

Making 2-Factor authentication mandatory

At this juncture, we're seriously considering this further step as the next one. There are a couple ways we could go about doing so:

In a first step, we could expose a company-wide setting where your security staff could set it as mandatory for your tenant to respect your company's security stance, with a disabled default to make for a smooth and easy transition.
In a second step, we could make it mandatory by default and bake it in the user registration/onboarding process.

One of the reasons we haven't made a call about it yet is that a lot of customers have a centralized AAA strategy to access their SaaS apps that goes through centrally managing it via SSO, with the implication that the underlying SSO should take care of the multi-factor strategy.

Do let us know what your preference would be on the matter!

Do let us know what your preference would be on the matter !

A note on Password-less authentication

One of the eventual benefits of WebAuthN is password-less authentication such as PassKeys: this standard converges towards allowing users to register to a web application without providing the proverbial insecure password and exclusively replace it in our user profiles data store with the generated Public Key from the initial WebAuthN challenge.
While this is one of our long term goals, password-less is not part of this release, as it requires us to completely overhaul the user registration process.

Still, do let us know if password-less authentication is something you'd like to see in the product in the future.

2 months ago

New BGP and Routing related Dimensions

One of Kentik's core missions has always been to help our users make sense of their infrastructure, taking the front seat in the Network Intelligence space by constantly enriching the Telemetry our users send us to ingest.
This release adds new BGP dimensions and filters for you and the AI Advisor to leverage as you are trying to make sense of the Infrastructure at the edge of your network.

Let's dive into it!

How do BGP enrichments work ?

When registering devices in Kentik, you have the option of establishing a BGP session with our SaaS or OnPrem cluster. These sessions, v4 and v6 are configured as iBGP Route Reflector Clients.
As we ingest your Netflow/Sflow/IPFIX telemetry, we map the SRC_IP and DST_IP from the flow fields with the Routing Data gathered from these iBGP sessions and enrich flows with such useful dimensions as

Source or Destination ASN (Autonomous System)
AS Path for the outgoing traffic
Next-Hop, 2nd Hop, 3rd Hop ASN from the AS Path
BGP Communities
A variety of VRF related dimensions
...

If you don't peer directly with our clusters for a Kentik-registered device, you can choose to adopt the routing table of another device, or use a Generic Routing Table to access part of that information.

Alternate enrichment of Source or Destination ASN with a Generic Table

If your device is iBGP-peered with Kentik's SaaS cluster, the Source and Destination ASN enriched in your Netflow/Sflow/IPFIX records will be in priority based off your own routing data, but will fall back to a Generic Routing table if your own routing information has no entry for a given Source or Destination IP. This Generic Routing table is built on MTR Route Dumps from the RouteViews Project (courtesy of University of Oregon).

Two things are worth noting
you should never send default routes (0.0.0.0/0) in your iBGP Route Reflector Client sessions to Kentik, as it will attract all source or destinations that you do not have a route for
if your device does not have an iBGP session established with Kentik, we'll use this Generic Routing table for your entire traffic

In some cases, using your BGP tables to enrich your traffic may hide an issue: if these are intermittent, you may see Source or Destination ASN flapping around for the same prefix, which can result in a long and often sterile investigation.

We are now solving that problem by adding two Source ASN (Generic Table) and Destination (Generic Table) to the default BGP available dimensions. These are additions and do not replace the original Source ASN and Destination ASN dimensions: they can be used together within the same Data Explorer query to more rapidly track down such situations. You'll find them in the Dimensions selector as depicted in the screenshot below:

Collapsed AS Path

Every prefix learned by a BGP peer contains an AS Path, which indicates the series of Networks (identified by their Autonomous System Number, aka ASN) - this path is used heavily in the BGP decision mechanism to determine which route is best when multiple are received, and the length of the AS Path is a key decision factor: the BGP route election process will select the one with the fewer hops (ASN Hops) in the AS Path attribute of the prefix received.

Most BGP-speaking Networks are homed Multi-Homed: this means they have at least two upstream providers to receive the Full Internet Routing table from. While it is trivial BGP-wise to influence which of the two upstream providers you want to select for any destination prefix, it is much more complicated (if not impossible) to influence which one of the two upstreams you want to receive traffic from in priority.
To achieve that, BGP offers a mechanism named AS Path Prepending which basically allows any ASN along the path to insert their ASN in the AS Path attribute of the prefix as a last-ditch effort have their upstreams prefer another route for this prefix (Last ditch because this is far from being an efficient method).

In the following example, AS62775 which originates two /48 IPv6 prefixes and announces them to AS396955 who in turns announces them to AS1299. AS396955 prepends their ASN one more time when announcing to AS1299, signaling that they want to prevent AS1299 to use them to reach these AS62775 prefixes.

While the prepending info is useful in itself in the AS Path because it publicly translates policy from the prepending party, it doesn't doesn't add much to the visualization if you only want to display each network your traffic towards these prefixes will go through.

As a way to de-noise the above picture, we've come up with a bunch of additional of AS Path related dimensions that contract the AS Path when it sees duplicate hops in it - these dimensions come in addition to the existing AS Path related ones, as can be seen on the screenshot below

Using AS Path (Collapsed) instead of AS Path as a Group By dimension will yield the following sankey for the same prefixes

IPv6 Flow Labels

IPv6 flow labels are a 20-bit field in the IPv6 header used to identify packets belonging to the same traffic flow, allowing routers to provide special handling for them. A flow is a sequence of packets from a specific source to a destination. The label is used to efficiently handle and prioritize these flows, such as for real-time voice or video, without inspecting the entire packet payload.

As this relatively new standard gets adopted more broadly (it allows routers along the path to perform special handling of a Flow between a Source and a Destination marked with these labels), a number of our customers have asked us to include this additional dimension to our flow enrichment process. This has now been done as part of the below highlighted dimension.

Unfortunately, as any new networking standard tends to be vendor specific, our initial support for IPv6 Flow Labels is currently limited to Juniper Networks devices.
Please do let us know if your current use warrants to extend this support to other vendors by raising a feature request with your Customer Success specialist and we'll add it to our list of future work to consider for future roadmaps.

2 months ago

Interface Classification additions

Interface Classification is one of the key components of Kentik Portal. It makes interface-based enrichments possible.

Network Boundary gives users an easy way to limit queries to traffic entering or exiting the network without the risk of double-counting.
Connectivity Type adds both technical and business context to traffic moving into or out of these interfaces, making it easier to identify, for example, which interfaces are used for peering or transit at the network's edge.
Provider (or Customer) automatically enriches any traffic on these interfaces with the name of the connected customer.

As Interface Classification is a load-bearing feature used throughout many of the portal workflows, including our AI Advisor (which relies on it to understand the tasks an interface performs), we have always kept the list of available values for Connectivity Types locked in.

Today we're adding three more values to Connectivity Types that our users have requested over the past years.

Management

Management Interfaces are quite self-explanatory. This Connectivity Type describes the port on a device that is connected to the Management network, which is the common network used to administer devices. It comes with the default Connectivity Type of “Internal” but can be set to “External” in the case of externally based OOB monitoring.

DDoS Mitigation: Cloud or Appliance

DDoS Mitigation Cloud or Appliance Connectivity Types are intended to classify interfaces that sit in front of a DDoS mitigation platform, whether it is an appliance-based internal solution (A10, Radware, Corero, etc.) or an external scrubbing DDoS Mitigation Cloud provider.

In one case, the default Network Boundary will be “Internal,” and in the latter, it will be “External.” The DDoS Mitigation: Cloud Connectivity Type pairs well with the Provider/Customer Interface Classification attribute, and users can programmatically set it using capture groups if a consistent Interface Description policy permits them to do so.

What's next ?

As we mentioned earlier, Interface Classification is tightly controlled, as it needs to provide consistent behavior across all areas of the Kentik Portal where it is utilized. This doesn't mean we are not open to suggestions from you regarding any additional required values, especially for Connectivity Types, that help better describe the taxonomy of your network.

Do let us know if you would like us to add more of these in the future.

2 months ago

AI Week: Kentik Portal Search gets an AI assist !

In May 2025, we introduced a major update to Kentik Portal's search capabilities. Since it was well received by our users, we queued an iteration to make it even more useful to you. Back then, we had added: Favorites, Most Recent Dashboards and Saved Views, and categorized lists of result matching common Portal objects such as ASNs, Devices, Interfaces....

With the recent launch of Kentik AI Advisor, we’ve started weaving AI more deeply into the Kentik experience. This new release continues that journey—this time by bringing AI into how you move around Kentik.

We're excited to introduce Navigation to Search, infused with awesome Kentik AI Superpowers!

What is Navigation Search ?

Kentik Portal delivers broad set of screens and functionalities, more than the average Kentik user can memorize - while our Navigation menu has served us well all these years to present these in an orderly fashion to our users, a few elements have come into play:

Users have gotten accustomed to functionalities provided by the Apps they use every day: amongst others, apps like "Spotlight" on MacOS, but also a large amount of SaaS apps have made it easier for users to navigate to functionalities or applications using a central search component
AI has become the new popular kid in town, and users are now expecting to be able to prompt their way into navigating towards functionalities
Portal functionalities have moved from one section of the portal to another section, with our users sometimes struggling to follow recent changes
A lot of new customers have joined Kentik that aren't yet fully familiar with its broad array of screens and functionalities

This is where Search comes to the rescue: starting today, you can now leverage our newly updated Search feature to navigate to portal screens: whether this makes navigation faster for the Keyboard, or helps you orient yourself towards a screen/feature which name you don't fully remember, just enter what you are looking for and Search will fetch it for you and report it in the new Navigation section of the results, as shown below.

Additional cheat-code: this whole operation can be entirely piloted via keyboard shortcuts

CMD + / (MacOS) or CTRL + / (Windows or Linux) will spawn the search box
↑ and ↓ keys will allow you to navigate the search results - while ← and → will let you switch between Favorites, Recents and Search Results tabs.
Enter will navigate to the selected result
Esc once will clear the search field, while Esc twice, will both clear the search field and leave the Search context

Great, but where's the AI in there?

Having heard this from more than one prospect or customer in the past, we have become increasingly aware that Kentik Portal packs more features than we're able to teach you in the course of a trial period. It's therefore not uncommon that our customers have a feature they've been toured that they want to use and simply can't find it anymore past the trial period in our dense feature set.

Here's an example:

Your Kentik Solutions Engineer toured you around the "Connectivity Costs" feature, which allows you to enter your IP Transit Contracts and track your Transit costs in Kentik Portal.
Only you can't remember what the name of the workflow, you just remember that there was a very neat feature demo'd to you that allowed you to track these.

AI-Powered Navigation search to the rescue!

Another example:

You remember being told that you needed to further classify your Interfaces in Kentik in order to get more accurate Data Explorer query results, but you just can't remember how the function is called and where to access it

Again, AI-Powered Navigation Search to the rescue!
Even better, we are showing the main Knowledge Base article for this feature as part of the Search Result displayed !

What about the Security aspects?

Search will not return results that a user does not have access to (based on RBAC and UserLevel configuration)
Each search action kicks multiple search jobs in parallel and appends results as they come back to the browser:
- new: A basic search process against a dictionary of all Screens and their Title and Descriptions.
  👍 This process does not leverage AI and doesn't go through Prompting, it will yield results regardless to your company's enablement settings for AI
- new: An AI search based on the same Site Map. For this process our Site Map contains a sample description paragraph for all the screens in Kentik Portal provided as additional context to the prompt
  🧠 This process is AI-powered, it is only enabled if your AI is enabled with your company.
- the legacy database search against Object Instances such as Dashboards, Saved Views, ASNs, IPs, Sites, Devices ...
  👍 This process doesn't leverage AI and doesn't go through Prompting, it will yield results regardless to your company's enablement settings for AI

4 months ago

Granular Permissions for Alerting and Protect

We're thrilled to announce a major granularity enhancement to Role-Based Access Control (RBAC) in Kentik. Gone are the days of broad, level-based access for Kentik Alerting and Protect. Build custom roles to define exactly who can create, view, update, or delete your critical alert policies, notification channels, and DDoS mitigations.

What's New?

We've rolled out a comprehensive set of permissions and custom roles specifically for Alerting and Protect. This update moves these modules into our modern RBAC framework, replacing the legacy user-level system for these features.

You can now create custom roles with specific permissions for:

Alerts: Control who can read, acknowledge, or clear alerts.
Alerting Policies: Manage permissions for creating, reading, updating, and deleting policies.
Notification Channels: Define who can create, read, and update notification channels.
DDoS Mitigation: Assign precise control over who can create, view, start, stop, and delete mitigations.
BGP Announcements: Manage who has the ability to view or withdraw BGP announcements.

Why It Matters

This is a huge step forward for security and operational efficiency. By creating custom roles, you can ensure your team members have exactly the access they need to do their jobs.

Enforce Least Privilege: Grant NOC operators the ability to acknowledge alerts without letting them change policies.
Delegate with Confidence: Allow your network security team to manage mitigations without giving them full administrator access to your entire Kentik account.
Streamline Workflows: Create roles like "Mitigation Authors" or "Alerting Policy Viewers" to match your team's specific responsibilities.

Take control of your user permissions today!

Ready to fine-tune your team's access? Administrators can head over to the Manage RBAC Roles page in your Organization Settings. From there, you can click "+ Create a Role" to start building your own custom roles with these powerful new permissions!

Your existing permissions within Alerting and Protect were all migrated to this new RBAC schema and existing role access will be unaffected by this update.

We're excited to see how you use these new controls to secure and streamline your network operations. As always, let us know if you have any feedback!

6 months ago

Traffic Costs Feature Expanded with New Traffic Slices!

We're excited to announce a major enhancement to Kentik's Traffic Costs feature, giving you even deeper insights into where and how your network spend is occurring. Two months ago we released Traffic Costs, an industry-first automated workflow enabling customers to instantly calculate how much various slices of network traffic were contributing to connectivity costs. https://new.kentik.com/unveiling-hidden-network-costs-introducing-traffic-costs-1yRCxi

And now with this exciting enhancement, you can analyze traffic costs across multiple new, powerful dimensions. The original Source/Destination ASN, AS Group, and AS Path as well as the Customer Port traffic slices are still available, and now you can analyze network spend based on:

CDN Provider: Understand costs by content delivery network to manage efficiency and performance, and negotiate better rates.
OTT Service, Provider, and Category: Get granular visibility into costs by Over-the-Top (OTT) traffic, including specific services and content categories.
Geographic Areas: Break down costs by country, region, and city to identify cost drivers by location.
IP/CIDR Blocks: Attribute costs directly to specific IP addresses or CIDR ranges for precise accounting and planning.

You’ll see all the new dimensions listed under Create a New Estimate on the Traffic Costs page.

For example, in this screenshot we can easily calculate and see how much it’s costing my network to receive traffic from Netflix every month and deliver to my subscribers.

And in this example, we’re looking at how much it’s costing my network to send traffic to Akamai each month.

These new traffic slices provide the actionable intelligence you need to optimize network spend across the business, improve traffic engineering, and strengthen cost accountability. Log in to the Kentik portal to explore the new capabilities today!

8 months ago

Universal Agent: Redesigning our Agents ecosystem from the ground up for better operability

In this post, we'll be covering a feature that was delivered a while back but had the gem of a long-term project hidden in it – and now is the time to talk about it. I'm talking about our (now not so) recently released Kentik NMS product – let's get back to this in a short moment.

Over the years, Kentik has built a number of Agent binaries – each one to carry out a specific function as a Telemetry Agent for its own type of telemetry.

kproxy lets you proxy flows from inside of your network to our public flow ingest cluster
kprobe is used as a DNS tap to provide the magic mapping between DNS and Flow records to unlock OTT observability
kbgp is a local BGP hub, which prolongs your BGP sessions towards our BGP ingest enrichment cluster
ksynth is the Synthetic Monitoring agent you run (privately) or we run (publicly), which performs Synthetic Tests

You'll notice the one missing here is our SNMP poller: you now see it as what we call a "Capability" of the Universal Agent we released when we unveiled Kentik NMS.

In a nutshell, you install Universal Agent, enable the NMS capability on it and you're off to the races. Hang in there, this is what this post is all about!

Operability challenges of Telemetry Agents

Managing large fleets of telemetry agents always comes with operational complexities – let's lay out a few observations we've made over the years in that field. In everything that follows, "operability" is a key term.

Observable agents

As your Telemetry comes to rely on these agents, they quickly become a critical part of your infrastructure, and therefore now require to be observable – some examples here:

If a Flow Proxy (currently named kproxy) becomes faulty, users need to be alerted. If they don't, they will assume the trough in their traffic charts is due to a network outage and waste valuable time troubleshooting the situation.
The team in charge of running your telemetry systems is often a different team than the one building and running the network – while they may not be daily Kentik users, they need to monitor them in a scalable way and reduce the amount of integration work needed to operationalize them.
Agents running on a host (virtual or physical) can go wrong for multiple reasons: maybe the host itself is not doing well (i.e. it's not the Agent's fault), maybe the function the Agent performs is not doing well, but the host is doing just fine. In other words, users want self-serviceability when it comes to determining why agents are not doing their job.

Frictionless upgrade path

When running large infrastructure, the last thing engineers want to do is have to upgrade a large fleet of Agents: "if it ain't broke, don't fix it" is usually the governing principle. Operational realities require the upgrade path to be the most frictionless possible:

Bug fixes can require upgrading a large fleet of agents – the task of upgrading a large fleet of agents therefore needs to be as frictionless as possible to maintain constant state of operation.
Availability of new features requiring Telemetry Agent upgrades tend to be delayed in favor of the aforementioned conservative approach.
Security updates to large Telemetry Agent fleets can get delayed because of upgrades deployment complexity – these are always critical, should always be seamless enough to not incur delays.

Agent proliferation vs. One-size fits all

With the rise of observability, agent proliferation in your infrastructure has been skyrocketing. Each new agent comes with its own upgrade track, bugs, security context... in other words, the operational complexity of one's telemetry setup increases exponentially with the number of agents required to operate one's infrastructure. All telemetry agents share common goals, requirements, and functions: they need to be deployed, monitored, and updated.

The first way that comes to mind to deliver these common functions is to collapse all agents into a single swiss army knife agent: the operational ease of this solution is appealing, but comes with a few significant drawbacks:

All functions carried by the agent require eventual updates, and having many functions served by a single agent usually results in increasing the frequency at which these need to be updated – depending on the number of functions collapsed together, this often results in a significant increase of update pace, therefore operational tax.
Each function performed by the agent comes with its own bugs and security weaknesses – collapsing multiple agents in one often result in increasing the bug and security risk per agent.

For the reasons above, the ideal setup is one where we can reap the benefits of both a single agent, while keeping multiple ones at the same time. Let's discuss our new approach to agentry in the next section!

Introducing Universal Agent

What is Universal Agent ?

"One ~~Ring~~ Agent to rule them all, One ~~Ring~~ Agent to find them, One ~~Ring~~ Agent to bring them all, and in the ~~darkness~~ Kentik Platform bind them"

With the aforementioned challenges in mind, our engineering team produced a modular design centered around a new deployable binary, named Universal Agent.

Universal Agent acts as a host governor module (literary pun intended), tasked with offering a common foundation to "capabilities" running under it: it acts as the sole controller towards our SaaS platform, handles the download and enablement of other agents (now named "capabilities"), handles under-the-hood update cadences for both itself and its governed capabilities, and collects/ships not only host-level metrics, but also specific metrics for each capabilities to the Kentik SaaS platform.

What benefits does Universal Agent offer ?

Operational peace of mind
Universal Agent is now the central piece of Kentik's Telemetry Agent strategy. Its setup process is trivial and its enrollment entirely driven by the Kentik Portal UI our users all know and love.

Furthermore, Universal Agent updates are transparently and gracefully managed "under the hood", and the same goes for any Capability run by the agent – little if no operator intervention is now needed to keep an Agent and its Capabilities up to date.

Central management & monitoring
The Settings > Universal Agents now becomes the central place where you will in turn manage your complete Kentik telemetry agent ecosystem. This interface lets you identify any agent or capability deployed on your network and its current running state.

Agent Observability
Each deployed Universal Agent reports host-level metrics, accessible directly from the Settings > Universal Agent screen

As a bonus, all agent host-level metrics are also available in Metrics Explorer under the /kentik/agent measurement tree without any extra work needed. Universal Agents have now become observable, with their vitals now available for dashboarding like any other NMS device.

One single binary to access all of our telemetry collection capabilities
Once it is deployed, Universal Agent gives instant access to all the telemetry functions we've ported over as "capabilities". These get installed and enabled upon simple click. While NMS was the initial capability we shipped Universal Agent with, our entire ecosystem of telemetry agents will follow over time and be integrated as a Capability.

Observability for each Agent Capability
Each enabled Capability comes with its own set of metrics, designed to describe its function. These metrics also get shipped for free to our NMS subsystem and displayed at Agent > Capability level in the Universal Agent Management UI. Again, as these metrics are being stored in our Metrics subsystem, they can be accessed via Metrics Explorer, but also alerted upon.

In the example above, an Universal Agent's NMS Capability will show how many Metrics Per Second it is currently handling, as well as the Network Devices it is polling.

What's next ?

With this foundation built, we have already started producing new Capabilities leveraging this new model:

Our newly released Syslog Server is one of these new capabilities
As part of the same release, we also released a Trap Receiver capability

We've already started porting over our existing Agents to this new "Capability" model – watch this space for more announcements in that field real soon!

Lastly, we will be leveraging our brand new NMS Alerting platform in the very near future to provide automated alerts on Agents and Capabilities Health.

8 months ago

Bulk edit improvements

In many areas of the Kentik Portal, users can bundle-select multiple objects to apply common configuration to them. This is a common requirement as soon as your infrastructure reaches a size where you want to manage Cattle over Pets.

A lot of functionality in Kentik Portal can be performed in batches:

Bulk actions on Devices (labeling, plan assignment, archive/deletion, Site assignment, NMS Monitoring Template assignment...)
Bulk actions on Interfaces (Assign Connectivity Type, Network Boundary, Provider/Customer, IX the interface is assigned to...)
Bulk actions on Sites (Assigning Site Market, Site Type, corresponding PeeringDB Facility...)
Bulk actions on Synthetic Agents and Synthetic tests (Labeling...)
and many more areas in Kentik Portal

As we continue to strive to improve the ease of use and operability of our product for users with large herds of infrastructure, we've rolled out a completely new Bulk Edit UX in a limited scope of Kentik Portal to test the better UX with our users. Read on.

A pilot for the Device Screens

As of today, this new Bulk Edit UX is visible in two areas of the product:

the Device Management screen /v4/infrastructure/devices
the Device Details > Interfaces tab /v4/infrastructure/devices//interfaces

This new UX wiill appear at the bottom of the devices list as soon as you select two or more devices, for example:

...and will let you modify a certain number of the attributes for this device – more actions will be added to this bulk edit menu as we start hearing feedback from our users.

Note how the left side of this also allows you to de-select these devices you've selected
Whenever possible, the individual attribute change will display a search field to immediately find what to set the attribute to
In the case of labels, where multiple selected devices may not have the same labels, a blue check will be displayed when all selected devices have this label on, whereas a [-] sign will show when only some of the devices have this label on, see below:

here all the selected devices have the "Arista" label

Whereas here only some selected devices have the Arista label on

This UX will also display in the Device Details page, as soon as more than one interface is selected:

What comes next

Depending on your feedback with this new UI, we will improve it as we go, but more importantly extend it to all other screens that currently contain bulk edit options in the legacy way we've done it.

One of the key benefits of leveraging web components in our front-end stack is the ability to drastically reduce the time needed to port this new design over to other parts of the product!