NMS: What's New in the Last 6 Months
We're thrilled to share the enhancements we've added to Kentik NMS over the past six months. Your feedback and continued support have been instrumental in driving our newest product forward. Here's a summary of the key updates:
New Device Workflows
- Topology Connections: We now show “Connections” on the “Device Overview” screen and “Interface Details” drawer. LLDP connections are automatically detected. Manual connections can be added where LLDP is not in use and are marked with an “M”:
- Device Dependencies: NMS can now detect when one device is downstream from another. When the upstream device goes down and the downstream device does not respond, the upstream device is marked as down while the downstream device(s) are marked as unobservable. This makes it easy to differentiate which device is down and needs your attention vs which devices simply is not responding due to the down device. The first place you will see this is in the status of a device:
Additionally, alerts configured to trigger when a device goes into status down will not trigger for these devices, of course. This is the default and will greatly improve the signal-to-noise ratio when alerting on downed devices. If for some reason you do want to alert for these devices, you can configure the alert to trigger for devices in status down or status unobservable.
To enable this functionality, you must specify the “Closest Network Device” for the NMS agent by navigating to Settings > Universal Agents (NMS), selecting an agent in the table, and clicking edit:
- Unobservable due to agent down: In the event that an NMS agent goes down, all devices being monitored by that agent will be marked as Unobservable. Mousing over the status label will display a notice indicating that the device is down because the agent is down, and will indicate the name of the down agent.
- ICMP-only devices: Sometimes you don’t have SNMP access to a device but still want to monitor it. NMS now supports this use case with "ICMP only" devices. Add these devices by going to "Menu > NMS > Devices" and clicking "Add Devices" in the top right. At the prompt, select "PING ONLY". Doing so provides up/down status and latency:
- API for Device CRUD and Query: To better serve the largest and most complex networks in the world, you can now use the Kentik API to create, read, update, and delete (CRUD) NMS devices. You can also now run Metrics Explorer type queries via API. To make it easier, Metrics Explorer can show you the API query for any query you’ve done in the UI.
New Alerting Workflows
- Acknowledge Active Alerts: Alerts can now be acknowledged even while they are active. This is a common practice to indicate to the rest of the team that someone is aware of the issue and is taking action. The name of the acknowledger will be noted an a comment can optionally be added. Users can also choose to automatically acknowledge additional occurrences of the same alert ("auto-ack"), for situations like a flapping link.
- Silence Notifications: Notifications can be "silenced" and "unsilenced" from the Alerting page with the push of the button. Alerts will still trigger and can be seen in the UI, but notification channels will not be executed. By selectively silencing notifications for alerts, network admins can better manage focus and reduce noise to their team.
- Suppress Alerts: You can now prevent an alert policy from triggering all together by using suppress alerts from the Alerting page.
Supressed alert policies will not trigger, and so alerts will not show up on the alert list and, of course, notifications will not be executed. You can see all configured Alert Suppressions on the Alert Suppressions page in Settings. From this page, users can view, create, edit, and delete Alert Suppression Patterns.
- State Alerts: State alerts were previously limited to out-of-the-box supported entities - Devices, Interfaces, and BGP neighbors only. You can now configure state alerts for any metric, including custom metrics. Another way to think about this is that threshold alerts alert when a metric is less than or greater than a value (or baseline) whereas state alerts alert when a value is equal to a certain value, for example the number 3 which corresponds to an interface being down.
Quality Enhancements
- Status Bugs: In very specific scenarios, devices were incorrectly marked as down. While most customers were not affected by this bug, those that were had instances where many devices were reported down but were not really down.
- SNMP Polling Efficiency: Several bug fixes involving SNMP timeouts, conflicting statuses and agent stability.
- Query Performance: In an effort to reduce the amount of time it takes to load some of the more complex NMS pages, we've made query optimizations to backend improve performance. While there's still room for improvement, we think you'll already notice the difference.
- Other Bugs: We addressed numerous issues relating to usability, reliability and predictability - especially where alerting is concerned.
In addition to the software changes above, you will find a great deal more information about NMS available in the knowledge base. These articles will help you get the most out of Kentik NMS.
We are committed to continuously improving to meet your needs and exceed your expectations. We encourage you to explore these enhancements and send us your feedback!
Thank you for being a part of the Kentik community. Stay tuned for more exciting updates soon.