Adding detection for two new CDNs: Edgevana and Valve
CDN Analytics is one of the most valued features in our platform, and for good reason. With 75% to 85% of a broadband ISP’s inbound traffic routinely being carried by just five major CDNs, understanding CDN-originated traffic is critical for managing subscriber performance and network capacity.
However, CDNs vary widely in how they operate. They range from massive commercial entities to single-purpose networks built exclusively for content providers (like Netflix’s Open Connect, Facebook's Network Appliances, or Google Global Caches). Because of this complexity, automatically enriching traffic data based on its true CDN origin is both an art and a science. It requires a smart detection engine – the core of our True Origin technology – that keeps evolving.
Additionally, the relationship between a broadband ISP and a CDN often extends beyond standard interconnection agreements, particularly when CDNs offer Cache Embedding programs. When caching appliances are co-located in the ISP's last mile, the ISP needs to verify the ROI: measuring the efficiency of that last-mile caching against the real-world costs of hosting, cooling, and powering the hardware.
Today, our detection engine grows again to help support these use cases. We are adding detection for Edgevana and Valve to our CDN Analytics, bringing our list of detected commercial and purpose-built CDNs to 65!
Let’s double-click into what this means for your network.
How does Kentik's True Origin CDN detection engine work?
The theoretical bits are somewhat trivial: flow telemetry comes in with source and destination IPs, which we map in real time to source or destination CDNs to enrich flow records we ingest. For this to be possible, our CDN Detection Engine constantly maintains a list of CDN-to-IP mappings behind the scenes. This list leverages a number of datasets:
- Multiple passive DNS datasets
- Our customer's real-time DNS data feeds
- Routing registry data for prefixes originated by CDN ASNs
- Interface classification-based detection of 'Embedded Cache' type interfaces
- Configuration inputs from our customers declaratively adding Embedded Caches
This data is parsed every day using a set of heuristics, and allows Kentik to generate a unique and very accurate data set of IP-to-CDN mappings that include 1) IPs for servers in each CDN's infrastructure, and 2) last-mile-embedded servers in the broadband ISPs of both customers and non-customers.
Leveraging Passive DNS data feeds and Interface Classification, setup for our customers is kept minimal as most of the embedded caches in their network will be automatically detected. In case they are not (one example is when these servers are behind a 3rd party router that's not registered with Kentik), a config interface allows them to statically declare them.
What happens when we add a new CDN to our detection engine?
Train the detection engine on CNAMEs
This basically means that we feed it with patterns to look for in terms of unique CNAMEs to be crawled in our set of passive DNS data-feeds. We're talking about the *.akamaized.net or `*.fastly.net` well-known CNAME patterns that CDN customers use to link their own domain names in order for these CDNs to serve.
This data is further fed to the engine via our OTT DNS Taps, since it taps the DNS queries upstream of your subscribers.
Wire the ASN-to-CDN mappings using Routing Registries
The other trivial source of data is the ASNs that a given CDN uses to run their infrastructure on: all prefixes originated by these ASNs is turned in to CDN-to-IP mappings. For instance Akamai is well known to operate AS20940, but they have successively acquired other companies such as Prolexic, Instart Logic, Linode ...and not all of their associated ASNs are used to deliver CDN traffic to users. In the same vein, Netflix is publicly known to operate the Open Connect cache network both at IXes on their own servers via AS2906, but also via embedded caches on AS40027. This updated list gets parsed every day and adds to the IP-to-CDN mappings from the day before.
We also add metadata to all CDN entries in the engine's directory service, so that we can present users with added context for a given CDN – such as:
- The public logo of the CDN to embellish the CDN's detail page
- CDN type – are they a Commercial CDN, a purpose-build content CDN, or a Cloud Provider
- What their associated ASNs are and PeeringDB entries are
- Do they run an Embedded Cache Program (such as Netflix's OCA, Facebook's FNA, Google's GGC, Akamai's AANP, Apple's AEC...) and store the link to these individual initiatives web pages
Here's what the /service-provider/cdn/Valve details page looks like:
Wire the detection engine to Interface Classification
For the CDNs running an Embedded Cache Program, we built our engine to be able to detect as many of the IPs corresponding to these cache servers, even when they are in your own last-mile network and not directly hosted in the CDNs ASNs.
One of the ways our engine detects this is it looks at SNMP interface data from your Network Devices and inspects those with an Embedded Cache Connectivity Type. For those, the engine takes the IP subnet the interface is on, and if able to match a CDN from the interface description or the Provider Interface Classification field, assigns an additional IP-to-CDN Mapping.
In the CDN Analytics > Config section of the portal, you can see which ones are automatically detected (updates once a day) as depicted below:
Add means for users to self-declare IP CIDRs to CDN Mappings
Some Content CDNs (aka purpose-built) that run a last-mile Cache Embedding program have been known to provide a router to the broadband ISP in addition to the caches. The CDN in this case retains admin control over this router, which in turn cannot be registered with Kentik.
The consequence is that Embedded Cache interfaces cannot be auto detected since they're located on a device not registered in Kentik. In this case, users from the ISP will want to declare the CIDR space used to number the caches manually, which they can do in the Additional Embedded Caches tab of the CDN configuration workflow.
When adding a new CDN with a last-mile Embedded Cache program, we ensure that it is added to the manual addition config screen for users to start entering their static mappings.
The following Cache Embedding CDNs are currently covered: Akamai, Amazon, Apple, Azion (Brazil), Baishan Cloud, ByteDance (tiktok), CDN77, Cloudflare, Edgevana, Facebook, G-Core, Globo, Google, i3D, Izzi, Limelight, Lumen/Level(3), Azure, Netflix, Netskrt, Quantil/China NetCenter, Qwilt, Valve.
As a result, the IP-to-CDN mappings resulting from this configuration form will be added to the daily mechanism that generates the global CDN-to-IP mappings.
Adding a CDN Offload widget to the landing page
While last-mile Cache Embedding provides many benefits to the broadband ISP, they also come with an operational cost around rack space, power, and cooling. Because of this, the embedding broadband ISP will often want to be able to measure the benefits for each embedding CDN to continuously justify the decision to embed.
The key metric for the main benefit is called offload: offload is a ratio between how much traffic is delivered to the subscribers from the cache vs. how much traffic is delivered to the subscribers in total – from a cache or from outside of the network. This Offload Ratio is commonly measured as a percentage. Traditionally, CDNs that transport a constantly growing long tail of user-generated content tend to have lower Offload Ratios (as popular content is competing for the discreet storage space offered by the embedded appliances). Conversely, CDNs with a fixed catalog (Netflix being a prime example, as the size of their catalog being somewhat fixed at all times) can achieve very high offload for a little cache footprint.
Note: there are subtleties between CDNs as to how we compute the Offload ratio. While we usually run these computations at peak time, because offload makes the most sense when traffic is at its highest, cases like Netflix Open Connect are different because the peak for inbound traffic happens when the caches fill themselves at night instead of during the delivery peak.
Users can enable/disable any Embedding CDN widget on the CDN Analytics landing page via the config screen, here's what it looks like:
And here's what a fully furnished landing page with offload widgets can look like:
Even better, all those independent widgets lead to a dashboard that will precisely let users figure out what the traffic patterns are for the caches when it comes to offload and will display all the following types of cache traffic:
- OffNet to Subscribers (aka longtail)
- OnNet to Subscribers (aka last-mile delivery)
- Cache to Cache Fill Traffic
- OffNet Cache Fill traffic
We hope you enjoyed this walk through on the internals of our CDN detection system. If you have any questions about it, please let Kentik's Product Team know, as we heavily count on your inputs to add new CDNs or new capabilities to the workflow as our CDN detection engine gets smarter and smarter!