3 years ago

SNMP: CPU and Memory utilization for F5 BIG-IP devices

Kentik now supports SNMP collection of the CPU and Memory utilization for F5 BIG-IP devices. Some of the F5 BIG-IP devices do not support flow export features, which means that the collection of the SNMP metrics would require use of Kentik's kproxy with the “bootstrap_devices” option.

The general behavior of the Kentik’s SaaS ingest or Kentik’s Kproxy is that the SNMP metrics collection would only start once the flows are received from the certain device. This behavior can be limiting in some cases and can be changed by the use of the Kproxy’s Bootstrap devices feature. This feature is enable with the use of the-bootstrap_devices command line argument. This argument should provide the comma-separated list of devices’ IDs. For those devices, the Kproxy will start the SNMP metrics collection immediately, without waiting to receive devices’ flows. More information about the Kproxy CLI arguments can be found in our Knowledge Base article: https://kb.kentik.com/v0/Bd04.htm#Bd04-kproxy_CLI_Reference

The Kentik’s Kproxy will determine that the discovered device is F5 BIG-IP by looking for the “big-ip” string in the well-known SNMP sysDescr OID.

The Kproxy would monitor the following 4 Device “Components”, with the use of the SNMP OIDs described below:

Name “Global”:
- MemoryTotal [bytes] - OID Name: sysGlobalHostMemTotal, OID: .1.3.6.1.4.1.3375.2.1.1.2.20.2.0
- MemoryUsed [bytes] - OID Name: sysGlobalHostMemUsed, OID: .1.3.6.1.4.1.3375.2.1.1.2.20.3.0
- CPU [percentage] - OID Name: sysGlobalHostCpuUsageRatio5m, OID: .1.3.6.1.4.1.3375.2.1.1.2.20.37.0
Name “TMM”:
- MemoryTotal [bytes]- OID Name: sysStatMemoryTotal, OID: .1.3.6.1.4.1.3375.2.1.1.2.1.44.0
- MemoryUsed [bytes] - OID Name: sysStatMemoryUsed, OID: .1.3.6.1.4.1.3375.2.1.1.2.1.45.0
- CPU - value will be 0
Name “Other”:
- MemoryTotal [bytes] - OID Name: sysGlobalHostOtherMemoryTotal, OID: .1.3.6.1.4.1.3375.2.1.1.2.20.44.0
- MemoryUsed [bytes] - OID Name: sysGlobalHostOtherMemoryUsed, OID: .1.3.6.1.4.1.3375.2.1.1.2.20.45.0
- CPU - value will be 0
Name “Swap”:
- MemoryTotal [bytes]- OID Name: sysGlobalHostSwapTotal, OID: .1.3.6.1.4.1.3375.2.1.1.2.20.46.0
- MemoryUsed [bytes] - OID Name: sysGlobalHostSwapUsed, OID: .1.3.6.1.4.1.3375.2.1.1.2.20.47.0
- CPU - value will be 0

For each component MemoryFree and MemoryUtilization are calculated from collected MemoryTotal and MemoryUsed metrics. Each component use standard Uptime which is collected at the device level from sysUpTime OID.

The support is available in kproxy starting from version v7.36.0. The example of the Data Explorer query is shown below:

3 years ago

Kentik’s Python SDK version 1.0.0 released

The important characteristics of the Kentik Platform are rich API capabilities and the supporting SDKs. For the last two years Kentik has supported the development of the community Python SDK which is based on the Kentik’s APIs. This SDK enables our customers to use Kentik’s Platform APIs natively in the Python programming language, with Python Objects and Methods instead of dealing with the details of the API syntax.

The Community Python SDK is available on GitHub: https://github.com/kentik/community_sdk_python.

Just over a month ago, we released a new version 1.0.0. Until this version, the community Python SDK supported objects and methods that are exposed within Kentik’s REST API v5. With this new release, the support has been extended to some of the endpoints of our new gRPC-based Kentik API v6, specifically supporting Synthetics monitoring and Cloud Export configuration.

Important note on breaking changes

As it is already mentioned, Kentik’s API v6 is natively a gRPC API, but it also supports the REST access. The community Python SDK is using the Kentik API v6 directly over gRPC. To accommodate communication with the Kentik backend using both REST-based Kentik API v5 and gRPC-based Kentik API v6, the necessary change has been introduced that would require a change of your existing Python scripts and programs.

In most of the cases, you would initialize the KentikAPI object with the constrictor that is using the api_url argument, for example:

from kentik_api import KentikAPI

client = KentikAPI(api_url=KentikAPI.API_URL_US, auth_email=email, auth_token=token)

The api_url argument would expect the URL to the Kentik’s API endpoint, which would be in the form: https://api.kentik.com or https://api.kentik.eu. However, the endpoint that is used for the Kentik’s API v6 is in the form of the host, for example: grpc.api.kentik.com.

For this reason and to be able to configure API access information with the single parameter, it was decided that api_url argument of the KentikAPI constructor should be replaced with api_host argument. The argument is expected to contain only the fully qualified hostname of the server hosting the target Kentik API instance (the default value is KentikAPI.API_HOST_US which is equal to api.kentik.com). Consequently, the Class variable KentikAPI.API_URL_US has been replaced with KentikAPI.API_HOST_US and KentikAPI.API_URL_EU with KentikAPI.API_HOST_EU

To summarize, if you upgrade the version of your Python SDK to 1.0.0 or later, you will need for adjust the initialization of the KentikAPI to use the changed argument, for example:

from kentik_api import KentikAPI

client = KentikAPI(api_host=KentikAPI.API_HOST_US, auth_email=email, auth_token=token)

Installation

You can easily install the latest version of the Python SDK using pip, for example:

$ python3 -m pip install kentik-api

Let us know what you think about our Python SDK and feel free to submit any contributions or issues over GitHub. Happy coding!

3 years ago

SNMP: CPU and Memory utilization for Palo Alto Networks devices

The list of supported devices from which Kentik can collect CPU and Memory utilization is growing. Kentik now supports SNMP collection of the CPU and Memory utilization for Palo Alto Network devices.

Palo Alto devices are using the standard HOST-RESOURCES-MIB, for CPU and Memory usage. The Kentik’s ingest/kproxy will determine that the discovered device is from Palo Alto by looking for the “palo alto” string in the well-known SNMP sysDescr OID.

For CPU:

Component name is provided in the OID: hrDeviceDescr: .1.3.6.1.2.1.25.3.2.1.3.index
CPU utilization is provided in the OID: hrProcessorLoad: .1.3.6.1.2.1.25.3.3.1.2.index

For Memory:

Component name is provided in the OID: hrDeviceDescr: .1.3.6.1.2.1.25.3.2.1.3.index
Memory utilization is provided from the SNMP table: hrStorageTable: .1.3.6.1.2.1.25.2.3

The support is available in kproxy starting from version v7.36.0. The example of the Data Explorer query is shown below:

3 years ago

Flow Ingest: Support for VLAN Fields in NetFlow/IPFIX

Kentik now supports collection of the NetFlow and IPFIX fields for source/destination VLAN, which we previously not collected from the received flows.

The related VLAN fields are shown in tables below:

NetFlow v9 VLAN fields

Field Type	Value	Length (bytes)	Description
SRC_VLAN	58	2	Virtual LAN identifier associated with ingress interface
DST_VLAN	59	2	Virtual LAN identifier associated with egress interface

Resource: https://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html

IPFIX VLAN fields

ElementID	Name	Abstract Data Type	Description	Reference
58	vlanId	unsigned16	Virtual LAN identifier associated with ingress interface.	[RFC5102]
59	postVlanId	unsigned16	Virtual LAN identifier associated with egress interface.	[RFC5102]

Resource: https://www.iana.org/assignments/ipfix/ipfix.xhtml

These two fields are collected from NetFlow/IPFIX protocols and stored in the Kentik’s Source VLAN and Destination VLAN dimensions.

The support is available in kproxy starting from version v7.36.0. The example of the Data Explorer query is shown below:

3 years ago

Connectivity Checker Updates

We continue developing new features for the connectivity checker in order to ensure better user experience for our customers

It’s now possible to run an ad-hoc test without creating a report first. This test can be saved as a report for later use.
New source and destination types. You can run a test between subnets, network interfaces and instances. All of the source and destination types are searchable using their names.

You can start a connectivity test directly from the object on the topology, with the source being automatically pre-filled.

Direct link to the AWS console is added on under details of impacted objects for easier failed connectivity test troubleshooting.

3 years ago

Additional Dimensions for AWS Service Traffic Filtering

Data Explorer allows you to drill into your traffic flows for better understanding of what is going on in your network. Data Explorer uses different dimensions and filtering to show you the flows of interest.

ENI Entity name and ENI Entity type were added to the list of supported dimensions on AWS. This allows users to see and filter traffic to and from AWS services such as Load Balancers and VPC endpoints.

3 years ago

Introducing Site Markets

By popular demand, we've added a new dimension named Site Market in Kentik Portal. It allows our users to group sites within Markets and use this new dimension in all parts of the product to filter, or group by.

When is this feature useful?

We've heard many times the situation where having multiple PoPs in the same metropolitan area makes it difficult to pivot Guided Mode dashboards around.
For instance, it is not uncommon that large networks have multiple PoPs around Los Angeles, say LAX1, LAX2, ..., LAXN. Most of the time these networks want to consider all of these as a single Metro: enters Site Markets, slap a "Los Angeles Metro" Site Market Label on all of these and you can now use it as a Dashboard Guide Mode parameter !

With its Ultimate Exit Site Market, and its special "Site Market doesn't equal Ultimate Exit Site Market" large networks are going to be able to perform useful queries telling them when traffic form one of their peers isn't hot-potato'ed and carried at higher cost over long distances.

How can I configure the feature?

You can navigate to Settings > Manage Sites where you'll notice the new Site Market attribute. You can create a new Site Market on the spot or pick an already existing one.

From the Manage Sites screen, you can even Bulk assign Site Market to sites, which obviously comes in handy the first time you discover the feature.

When first configuring your Site Markets, you can also leverage the filters on the Site screen to quickly select all those without a Site Market, which together with the aforementioned feature really helps moving faster.

The Manage Site Markets screen

Additionally, Site Markets now have their own Settings screen, which can be found here:

Custom Geos vs Site Markets

There is a misleading parallel between Custom Geos and Site Markets, so let's take a look at these:

Custom Geos: map a Source Geo and Destination Geobased on
- GeoIP data from Source IP and Destination IP
  and
  Custom Geo to Country mapping from the Custom Geo user configuration
Site Markets: map a Site Market based on
- Site to Site Market mapping from the Site Market user configuration

The main differences are:

Site Market is a superset of Sites when Custom Geo is a superset of countries
Site Market is not directional (green column in Data Explorer's dimension selector), while Custom Geo exists in Source and Destination variants

Using Site Markets

Site Market dimensions are available in both Data Explorer Dimensions and Filters, and come with their "Ultimate Exit" pendent: Ultimate Exit Site Market is mapped from Ultimate Exit Site using the users' Site Market groupings from the configuration screen.

Finally, the Site Market filter comes with a very useful operand saying "Site Market doesn't equal Ultimate Exit Site Market" which will help networks identify traffic that does not stay local (and is therefore more expensive to transit - we trust this feature will be very useful to large Service Provider networks.

Enjoy and let us know what you think about the feature!

3 years ago

kbgp (Kentik BGP proxy) goes Beta

BGP enrichment to flow-collected telemetry from a Kentik user’s network was historically made possible via public peering of a Kentik registered device to Kentik’s public BGP cluster (or leveraging the BGP table from another, BGP-peered device).

However, there are situations where devices exporting flow telemetry to Kentik cannot reach the public internet (e.g., have no public IP address for a BGP session with Kentik).

Enters Kentik’s BGP proxy, aka kbgp.

What is kbgp ? tl;dr

Kentik BGP proxy, aka kbgp has just been released in its initial beta version. It can be deployed in a private environment — upon deployment, the aforementioned devices will be able to peer with the kbgp instance which will multiplex and relay in real time, all BGP updates “as if” these devices were peering with our public BGP ingest layer.

Initial discussions took place around making kbgp a part of kproxy, Kentik's well known flow proxy. There's a variety of pros and cons, but for now, the main deciding factor was separation of concerns: the BGP and Flow proxy functions being fundamentally different, we wanted to avoid building a monolithic agent that would instantly become a single point of failure and also avoid situations where flow export could be interrupted by the BGP portion and vice versa.

How does kbgp work ?

We are pretty proud of the design behind kbgp — a lot of engineering forethought was put into it and the early testing customers have been impressed with the polish, stability and scalability of this early version and are already starting to adopt it quickly.

Multiple Kentik registered devices can peer with a single kbgp instance, as it is highly scalable and takes care of all the multiplexing over a secure gRPC transport. Kbgp will not store BGP state to remain the least intrusive possible and will just manage state of the peering sessions established with it, as well as forward in real time, any BGP updates directly to Kentik.

This offers a few additional side benefits:

kbgp scales very well (since there’s no storing of routes) and can accept peering sessions from a lot of devices — the max # BGP sessions per kbgp agent is yet unknown, make sure not to create a single point of failure.
The transport that was chosen back to Kentik's BGP ingest layer offers a layer of encryption over the public internet to relay the BGP updates — it is an added benefit offered by gRPC.
A great side benefit is that it unlocks IPv6 BGP peering (and therefore BGP related enrichments to IPv6 flows) without the need of establishing a public IPv6 BGP session. All that’s needed is an internal IPv6 address to peer with from the device, and the updates will be transported by kbgp using IPv4.

What comes next / How can you gain access to it?

As any Beta software, kbgp comes with a few rough edges, but our closed-beta testers have so far been very positive about it.

Contact your Customer Success Engineer if you want to get access to the agent and deploy it on your system, our engineering team is ready to welcome your feedback to make it better !

As we build kbgp's roadmap towards a GA, more features will be added to upgrade kbgp so that you can manage it more efficiently, this will likely include things such including it as a first class citizen in Kentik Portal, reporting the health of its host and the BGP sessions it proxies ... Stay tuned !

We may in the future consolidate kproxy and kbgp into a single binary, but to avoid each function competing for resources on the host and prevent the resulting agent from becoming a single point of failure, we may very well favor a mutually exclusive switch, turning the agent into one or the other.
Don't hesitate to let us know what your thoughts are on this.

3 years ago

Synthetics Incident Log enhancements

The Incident Log has been enhanced with some powerful new features.

The log now supports a time series stacked bar chart displaying the volume of alerts by alert severity. This enables users to quickly identify alerting trends over time.

The log also supports many new filtering options that allow the log and the stacked bar chart to be filtered by alert severity, alert status, test name, test type, label and agent (private/global).

When one or more filters are selected an indicator displays the total number of filters currently enabled along with the time range selection.

The selection is also persistent, so navigating away from this page will not cause the selection to be lost.

Finally, we will be collapsing the Performance Dashboard menu item into the Synthetics title itself.

Clicking on "Synthetics >" will bring you to the (soon to be) newly named "Synthetics Dashboard"

3 years ago

Policy Configuration: Build and Edit your Policies

The new Alert Policy editor introduces a common policy authoring experience for both Custom policies and DDoS policies. You can navigate here from the Policies page, or by selecting a template from the Policy Templates page to use as the prototype for a new policy.

We’ve redesigned this experience to simplify configuring and enabling a policy to trigger alerts. The same configuration options apply for constructing both DDoS detections and custom alerts, and the workflow to configure these is the same.

Configuring alerts can be complex, but constructing the conditions for alerts is both powerful and very flexible. We've done some work in this revision to accelerate and simplify the process of building a policy.

Let's start with navigation. There are four tabs in the policy editor; you navigate to each of them by selecting the heading at the top of the form.

You'll also notice a new "Summary" display on the right side of the page, to help keep track of values from other tabs as you work, and also to see and correct validation issues that arise.

Clicking on validation issues will navigate directly to the tab where you can resolve the issue.

In the General tab, you'll describe the policy - naming it, and providing a description of your intent. This is also where you'll enable the policy to create alerts, or to silence it while it build baselines. This is also where you can specify a dashboard to display alerts triggered by this policy.

In the Dataset tab, you'll define the focused subset of data you're interested in evaluating. This is the data that will be examined and tested against the conditions that will be defined in the Thresholds tab.

The controls on this page are similar to the controls you're familiar with from the Data Explorer - defining sources of data, and the specific dimensions and metrics that will make up the information this policy will evaluate against each of the threshold defined in the next tab. This is also where specific filters can be applied to refine or exclude data that should not be evaluated.

You can read more detail about the dataset selection dialogs in the Kentik Knowledge Base article "Alert Policies" topic.

One new feature you'll notice here is that the content of this page has been simplified, compared to the previous release - and all of the more complex and detailed options have been moved into an expandable area at the bottom of the page in "Advanced Settings."

For most alerts, you won't need to change the configurations here, but they are available to advanced users for specific use cases. Refer to the Knowledge Base for more detailed guidance on setting these parameters.

Moving to Thresholds, you'll see five tabs that define threshold conditions and actions for each of the five levels of severity.

Conditions describe when alerts will be triggered for this level of severity; you can define conditions for traffic volumes, presence in the top keys, capacity for an interface, or ratios between metrics. Ratio conditions are new with this release, and evaluate the relationship between metrics to determine a trigger for the alert.

Actions describe automated notifications or DDoS mitigations that will be executed when the alert is triggered.

Finally, the Baseline tab has been simplified to offer one of three presets. Each of these describes how baselines for threshold conditions are constructed.

In most cases, the Default preset will produce a useful baseline for most alerting applications. You can select "Express" to produce a baseline more rapidly, or "Precision" to build a more detailed baseline over a longer period of time.

In this tab, you also have access to the individual detailed configuration parameters through the "Advanced Options" area.

Finally - there are preconfigured Policy Templates you can access from the Policies page, or through the "Add Policy" dialog:

Policy Templates are prototype policies you can copy and customize for your requirements. Selecting a policy template here provides the same Summary view of the values in the template to guide you make a useful selection.

This represents our initial body of work to simplify the Alerting workflow, and we hope you find these changes easier to work with. We appreciate that with each changes comes a learning curve, so we'll work to improve this area of the product incrementally.

As always, please let us know what you think in the comments!