CDN Analytics update
Our CDN Analytics workflow relies on a detection engine which both takes into account your own Kentik configuration (via Interface Classification) and its constant scanning of the internet to detect IPs from CDN Servers out on the Internet. In this release, we added means for our users to help it better detect, or take into account undetected CDN caches that they are embedding in their networks.
Important note: New configuration possibilities offered by this workflow update are highly advanced - if you feel like you need to use this feature, please seek advice from your Customer Success Engineer.
Understanding Kentik's CDN detection engine
Kentik enriched flow data includes src_cdn
and dst_cdn
fields, and these are accessible in Data Explorer as well as the namesake workflow CDN Analytics, they are also leveraged within the OTT Service Tracking workflow.
Values for these flow columns come from Kentik’s special sauce: every day a complex engine runs and evaluates multiple data-sets to map as many IPs as possible to CDNs. src_cdn
and dst_cdn
are then populated in flow records using this global (<IP>,<CDN>)
mapping table.
Datasets used by the engine for classification
CDNs can be a mix of servers either hosted on the CDN’s own network infrastructure, or inside of other networks. While servers hosted on the CDN’s own infrastructure are somewhat easy to detect, last mile embedded caches aren’t, because they’re addressed with the embedding broadband ISP’s address space.
The CDN detection engine uses a combination of the methods below, when it runs every day:
- Internet Scans data
Determines how public CDN cache server IPs are detected, as well as a portion of last mile embedded caches.
This process is not detailed in the present article. - Kentik Customer data
This mechanism helps with detecting IPs of embedded caches that are not identified by the previous datasets.
This detection mechanism is based on a customer’s Interface Classification accuracy.
The engine pulls and inspects all interfaces whose Connectivity Type is set to Embedded Cache and look at the Provider value for the interface. We perform a loose match with the Provider field then assigns the interface's subnet to the CDN that has been matched with our database.
This example shows an Interface Classification Rule which will result in the associated interfaces' subnets to be picked up by the CDN detection engine and associated to a CDN.
Corner cases for Embedded Cache detection
While this approach has shown good results, some customer setups make it difficult to entirely (or sometimes correctly) detect Embedded Cache IPs. The examples below detail such corner cases.
Situation #1
Here, the interface tagged Embedded Cache is actually a /31 or /30 point-to-point subnet to connect with another Network Device on which the Embedded Caches are located, with the aggravation that this Network Device is not sending Flow Telemetry to Kentik.
In this case, two things will happen:
- the engine will mislabel the /30 or /31 to a CDN if its provider field is set and matched (this happens for Facebook FNA caches, CDN77 caches... which often come with a router shipped with the caches by the CDN provider)
- the engine will be missing the Caching Server Subnet below for not having any Interface Classification from the non-kentik-registered router.
Situation #2
This situation is pretty similar to the previous one, with the additional complication that the interface sending flow data is not tagged as Embedded Cache by the Interface Classification rules.
In this instance above, the CDN Engine cannot detect the subnet where the cache servers are, because it doesn't have an Embedded Cache interface to pickup the subnet from.
Solving for these situations: the new CDN Analytics configuration screen
Inspecting and amending CDN Engine detected cache entries
To remedy both of the aforementioned situations, the Configuration button in the CDN Analytics workflow will allow for fine grain additional configuration.
The first tab of this configuration screen, Detected Embedded Caches, will:
- List all the subnets detected from Interface Classification and the associated Provider values, as well as the CDN that's been assumed from the Provider value
- Let the user disable those that are tagged Embedded Cache but whose subnet is not a cache subnet (example of Situation #1)
- Add a CDN Provider to any interfaces labeled Embedded Cache where the Provider value hasn't been set, therefore which the CDN Engine couldn't make a CDN attribution decision on.
Manually declaring undetected Embedded Caches
To also address the situations of having caches behind non-Embedded Cache classified interfaces, and therefore undetected by the CDN Detection Engine, users have the option to manually declare CIDR blocks and assign them to CDNs, via the Additional Embedded Caches tab of this configuration screen.
Entries in this configuration section will now be picked-up with every daily run of the CDN Detection Engine and will be added to the global (<IP>,<CDN>)
dataset in use by all our customers.
Understanding how the CDN Analytics workflow leverages your Network Telemetry
Bolting src_cdn and dst_cdn values into your enriched flows is not the only thing that needs to happen for your CDN Analytics workflow to display accurate data.
The workflow needs to inspect the right flow data from your Network Devices to deliver a cohesive analysis of the CDN traffic that not only enters your network, but also is generated from embedded caches within it. To do so, this workflow uses this generic filter:
As you will notice, this filter will work perfectly as long as all of the Embedded Caches in your network are connected to a router exporting flow data to Kentik. When this is not the case, the flow data to include will simply not be collected by Kentik from the interface facing the cache.
it is important to understand that mapping Cache Server IPs and selecting all the cache related flow telemetry are two aspects that need to work for the CDN Analytics workflow to display accurate results.
In the previously discussed Situation #1 and Situation #2, flows coming from the cache servers will not be considered by the filter displayed above, therefore we need ways to add these to the filter-set that regulates what flows the CDN Analytics workflow will consider.
Tuning the CDN Analytics workflow to include non Embedded Cache traffic
Let's go back to Situation #2
Now we want the CDN Workflow to include traffic from the Caching Server Subnet to the data it will display in its analysis. This will be done by using the Advanced Filtering section of the CDN Analytics configuration options:
As you can see in the screenshot above, traffic from eth0/0/1 on the edge-01 network device coming from the Cache Subnet would not normally be caught by the default filter, adding it modifies the CDN Analytics behavior to include it now.