kentik Kentik Product Updates logo
Back to Homepage Subscribe to Updates

Kentik Product Updates

Latest features, improvements, and product updates on the Kentik Network Intelligence Platform.

Labels

  • All Posts
  • Improvement
  • Hybrid Cloud
  • Core
  • Service Provider
  • UI/UX
  • Synthetics
  • Insights & Alerting
  • DDoS
  • New feature
  • BGP Monitoring
  • MyKentik Portal
  • Agents & Binaries
  • Kentik Map
  • API
  • BETA
  • Flow
  • SNMP
  • NMS
  • AI

Jump to Month

  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • July 2021
  • June 2021
  • May 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • October 2020
  • September 2020
  • June 2020
  • February 2020
  • August 2019
  • June 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • September 2018
  • August 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • April 2016
ImprovementCoreNew featureAgents & Binaries
2 months ago

Universal Agent: Redesigning our Agents ecosystem from the ground up for better operability

In this post, we'll be covering a feature that was delivered a while back but had the gem of a long-term project hidden in it – and now is the time to talk about it. I'm talking about our (now not so) recently released Kentik NMS product – let's get back to this in a short moment.

Over the years, Kentik has built a number of Agent binaries – each one to carry out a specific function as a Telemetry Agent for its own type of telemetry.

  • kproxy lets you proxy flows from inside of your network to our public flow ingest cluster
  • kprobe is used as a DNS tap to provide the magic mapping between DNS and Flow records to unlock OTT observability
  • kbgp is a local BGP hub, which prolongs your BGP sessions towards our BGP ingest enrichment cluster
  • ksynth is the Synthetic Monitoring agent you run (privately) or we run (publicly), which performs Synthetic Tests

You'll notice the one missing here is our SNMP poller: you now see it as what we call a "Capability" of the Universal Agent we released when we unveiled Kentik NMS.

In a nutshell, you install Universal Agent, enable the NMS capability on it and you're off to the races. Hang in there, this is what this post is all about!


Operability challenges of Telemetry Agents

Managing large fleets of telemetry agents always comes with operational complexities – let's lay out a few observations we've made over the years in that field. In everything that follows, "operability" is a key term.

Observable agents

As your Telemetry comes to rely on these agents, they quickly become a critical part of your infrastructure, and therefore now require to be observable – some examples here:

  • If a Flow Proxy (currently named kproxy) becomes faulty, users need to be alerted. If they don't, they will assume the trough in their traffic charts is due to a network outage and waste valuable time troubleshooting the situation.
  • The team in charge of running your telemetry systems is often a different team than the one building and running the network – while they may not be daily Kentik users, they need to monitor them in a scalable way and reduce the amount of integration work needed to operationalize them. 
  • Agents running on a host (virtual or physical) can go wrong for multiple reasons: maybe the host itself is not doing well (i.e. it's not the Agent's fault), maybe the function the Agent performs is not doing well, but the host is doing just fine. In other words, users want self-serviceability when it comes to determining why agents are not doing their job.

Frictionless upgrade path

When running large infrastructure, the last thing engineers want to do is have to upgrade a large fleet of Agents: "if it ain't broke, don't fix it" is usually the governing principle. Operational realities require the upgrade path to be the most frictionless possible:

  • Bug fixes can require upgrading a large fleet of agents – the task of upgrading a large fleet of agents therefore needs to be as frictionless as possible to maintain constant state of operation.
  • Availability of new features requiring Telemetry Agent upgrades tend to be delayed in favor of the aforementioned conservative approach.
  • Security updates to large Telemetry Agent fleets can get delayed because of upgrades deployment complexity – these are always critical, should always be seamless enough to not incur delays.

Agent proliferation vs. One-size fits all

With the rise of observability, agent proliferation in your infrastructure has been skyrocketing. Each new agent comes with its own upgrade track, bugs, security context... in other words, the operational complexity of one's telemetry setup increases exponentially with the number of agents required to operate one's infrastructure. All telemetry agents share common goals, requirements, and functions: they need to be deployed, monitored, and updated.

The first way that comes to mind to deliver these common functions is to collapse all agents into a single swiss army knife agent: the operational ease of this solution is appealing, but comes with a few significant drawbacks:

  • All functions carried by the agent require eventual updates, and having many functions served by a single agent usually results in increasing the frequency at which these need to be updated – depending on the number of functions collapsed together, this often results in a significant increase of update pace, therefore operational tax.
  • Each function performed by the agent comes with its own bugs and security weaknesses – collapsing multiple agents in one often result in increasing the bug and security risk per agent.

For the reasons above, the ideal setup is one where we can reap the benefits of both a single agent, while keeping multiple ones at the same time. Let's discuss our new approach to agentry in the next section!

Introducing Universal Agent

What is Universal Agent ?

"One Ring Agent to rule them all, One Ring Agent to find them, One Ring Agent to bring them all, and in the darkness Kentik Platform bind them"

With the aforementioned challenges in mind, our engineering team produced a modular design centered around a new deployable binary, named Universal Agent.

Universal Agent acts as a host governor module (literary pun intended), tasked with offering a common foundation to "capabilities" running under it: it acts as the sole controller towards our SaaS platform, handles the download and enablement of other agents (now named "capabilities"), handles under-the-hood update cadences for both itself and its governed capabilities, and collects/ships not only host-level metrics, but also specific metrics for each capabilities to the Kentik SaaS platform.

What benefits does Universal Agent offer ?

Operational peace of mind
Universal Agent is now the central piece of Kentik's Telemetry Agent strategy. Its setup process is trivial and its enrollment entirely driven by the Kentik Portal UI our users all know and love.

Furthermore, Universal Agent updates are transparently and gracefully managed "under the hood", and the same goes for any Capability run by the agent – little if no operator intervention is now needed to keep an Agent and its Capabilities up to date.

Central management & monitoring
The Settings > Universal Agents now becomes the central place where you will in turn manage your complete Kentik telemetry agent ecosystem. This interface lets you identify any agent or capability deployed on your network and its current running state.

Agent Observability
Each deployed Universal Agent reports host-level metrics, accessible directly from the Settings > Universal Agent screen

As a bonus, all agent host-level metrics are also available in Metrics Explorer under the /kentik/agent measurement tree without any extra work needed. Universal Agents have now become observable, with their vitals now available for dashboarding like any other NMS device.

One single binary to access all of our telemetry collection capabilities
Once it is deployed, Universal Agent gives instant access to all the telemetry functions we've ported over as "capabilities". These get installed and enabled upon simple click. While NMS was the initial capability we shipped Universal Agent with, our entire ecosystem of telemetry agents will follow over time and be integrated as a Capability.

Observability for each Agent Capability
Each enabled Capability comes with its own set of metrics, designed to describe its function. These metrics also get shipped for free to our NMS subsystem and displayed at Agent > Capability level in the Universal Agent Management UI. Again, as these metrics are being stored in our Metrics subsystem, they can be accessed via Metrics Explorer, but also alerted upon.


In the example above, an Universal Agent's NMS Capability will show how many Metrics Per Second it is currently handling, as well as the Network Devices it is polling.

What's next ?

With this foundation built, we have already started producing new Capabilities leveraging this new model:

  • Our newly released Syslog Server is one of these new capabilities
  • As part of the same release, we also released a Trap Receiver capability

We've already started porting over our existing Agents to this new "Capability" model – watch this space for more announcements in that field real soon!

Lastly, we will be leveraging our brand new NMS Alerting platform in the very near future to provide automated alerts on Agents and Capabilities Health.

Avatar of authorGreg Villain