New Universal Agent Capability: OTT DNS Tap
In a previous announcement, we introduced Universal Agent as a foundational piece of software to further operationalize and unify our collection of telemetry agents under a single umbrella. With the benefits of this approach, we are hard at work porting all of our existing collection agents towards this new paradigm as Universal Agent Capabilities.
Today, we will be talking about our OTT Service Tracking DNS tapping agent, and how as an existing OTT Service Tracking user you can migrate these DNS taps at no operational cost and start benefiting as early as today from their highly improved operability. Read on!
OTT Enrichments, how do they work?
Firstly, let's review how Kentik's OTT Service Tracking functionality works. Contrary to DPI (Deep Packet Inspection) which requires you to deploy DPI hardware at your network edge to map your subscribers' consumed applications, Kentik offers a creative, lightweight, and operationally and financially efficient method to perform the same task: users deploy a DNS Tapping Agent, in addition to exporting network flow telemetry from their devices, and our True Origin engine maps DNS query responses to traffic based on an ever-growing library of domain name patterns to directly color this flow telemetry with OTT describing attributes – OTT Service, OTT Category, OTT Provider.
Until today, the DNS tap collection was instrumented via our former host monitoring agent kprobe
in a specific mode that did not export host telemetry, but only DNS query/responses. The drawbacks of this legacy approach are:
- kprobe is not observable in Kentik Portal, both from a DNS tapping activity metrics standpoint and an out-of-the box alerting standpoint
- kprobe upgrades are manual, requiring deployment for each new version
- combining host monitoring and DNS tapping in the same telemetry agent introduces a shared bug surface between two very different functions
- in cases where kprobe couldn't be installed on a DNS resolver, deploying it as a DNS tap required a complicated launch command
Taking a hard look at these constraints, we are now happy to offer a much more operable and easy-to-deploy solution via a new Universal Agent capability, so let's look at the benefits now!
So what's new ?
Trivial deployment of DNS taps, under the hood upgrades
As of today, all Universal Agents deployed offer a new capability aptly named DNS OTT Tap which replaces kprobe's legacy role of conveying DNS query/responses to our flow ingest clusters for OTT-related flow enrichments. Installing it will download the capability's core binary and enable it.
Once the capability is enabled, users will be able to configure the few parameters, and the Universal Agent host will keep the OTT DNS Tap capability in its latest version without any further operational attendance needed.
Easy promiscuous mode
You can now select a specific host interface to capture DNS queries and responses. In addition, if you’re using port mirroring, port spanning, or tunneling to send this traffic from the server-facing port to another host, you can enable Promiscuous Mode on that interface to capture it, as shown in the diagram below.
OTT DNS Tap metrics
Every Universal Agent capability comes with its own set of metrics. The OTT DNS Tap is no exception to this principle: clicking on the capability [Details] button will show two charts – one for the amount of DNS query/response funneled by the capability to Kentik's ingest clusters, and a second on the number of query/responses discarded, to monitor for any issue related to the capability's specific job.
As can be seen on both screenshots, both metrics are instantly available in Metrics Explorer for further reporting, so that administrators of the DNS Tap fleet can quickly troubleshoot. Here's an example of a single Metrics Explorer query showing the number of query/responses per seconds that an entire fleet of DNS Taps is performing:
At last, we've improved the [Configuration] screen of our Service Provider > OTT Service Tracking workflow to now include all deployed OTT DNS Taps with their agent status health.
What does the migration path to the OTT DNS Tap Universal Agent Capability look like?
The process to switch from a standalone kprobe setup to Universal Agent's OTT DNS Tap capability couldn't be safer and simpler. It consists of the below steps:
- On each DNS server where kprobe is currently running on, deploy Universal Agent. (Knowledge Base Article)
The process is trivial: enter the command line on the server's shell and follow the instructions until Kentik Portal offers you to register the newly detected agent. - Once Universal Agent is installed successfully on the DNS server, install the OTT DNS Tap capability. (see Knowledge Base entry here)
- Configure the OTT DNS Tap capability to your liking – default settings should cover most of the installs.
- At this point both kprobe and the OTT DNS Tap will be sending the same data to Kentik's DNS ingest cluster, and it does not affect the OTT enrichment data at all.
- Verify that the OTT DNS Tap capability is receiving DNS Query/Responses from the capability's drawer in the Universal Agent UI. (see screenshot in the OTT DNS Tap metrics paragraph in this post)
- 🎓 Congratulations, you are done: you can now safely uninstall kprobe and proceed to the next DNS server.
The simplicity of the migration path relies in the fact that both kprobe and the new Universal Agent capability can coexist without causing any OTT Flow enrichment issues.
👌 So, go ahead and migrate your kprobe instances right away and benefit from the improved observability of our Universal Agent as soon as today!
Note: if any doubt whether the kprobe instance running on a DNS server is used as a Legacy DNS Tap or to generate host flow telemetry, the following command on the host will help disambiguate - if it yields any result, then there's a kprobe running on this instance needing to be replaced with a Universal Agent OTT DNS Tap capability:
ps auxw | grep kprobe | grep dns
What comes next ?
In one of our next releases, we'll be adding out-of-the-box alerting for both Universal Agents and capabilities, sending you notifications whenever your fleet of telemetry agents is encountering issues.
In addition, we have a really neat slate of improvements that we are also going to bring to life in the near future, amongst others: new agents such as Flow Proxy (fka kproxy) will be ported over under Universal Agents, as well as some large scale deployment options, and also an initial set of HA (High Availability) options – so watch this space!