Reengineering our Scheduled Reports Subscription engine from the ground up
One of Kentik Portal's time tested assets has always been its capabilities around Network Analytics, and more specifically the Dashboarding and Reporting capabilities it offers. Around the same time last year, we augmented our Subscriptions engine with capabilities around naming generated report files, and more granularly selecting TO, CC and BCC recipients for these reports to be sent. Rewind back another year to 2022, we created a UX pattern across Kentik Portal for users to easily configure these reports in a cohesive interface under the now famous Share button.
We're at it again this year, but have completely rebuilt the foundations of the backend systems which generate these reports and send them to you, so read on !
What did we set off to improve?
At first, let's see how Report Subscriptions used to work: based on user configuration of these reports would be run once a day in our SaaS and on-prem clusters.
The implementation that served us well for all these years was a sequential one: once a day, a job would run, collect all the reports from the entire installed base and queue them up to be generated and sent sequentially.
While robust, this implementation presented the following drawbacks we weren't fans of:
- While once a day works for a set of customers in a narrow geography, reports configured by users located around the world would get generated at times that weren't in line with these users' time zones, hence receiving subscription emails at unpractical times. Since some of these reports were based on look-back windowed analytics, all look-backs were dependent on the time at which the reports were being generated, going back to the previous point. As an example, midnight Pacific would correspond to mid-day in Singapore, so our Singaporean users wanting a one-day report from midnight to midnight would actually only be allowed to configure a one-day report from 3pm to 3pm, which didn't exactly line up with the activities of their users.
- The job running all our tenants' subscribed reports sequentially could occasionally fail or choke on a single report generation, causing the rest of the queue to be further delayed until we could unblock the queue.
- As we monitored the amount of reports generated every day, we also started monitoring the duration required for the main daily job to run. In the most recent runs, the report generation would hum along for about 7 hours to generate and send all reports.
-> again, this had an impact on not only the time at which they would be sent, but also made the look-back windows undesirably elastic (i.e. less predictable)
What does the new Report Subscriptions system look like now ?
As we release new Report Subscription engine to our customers, we can confidently tackle current and future challenges thanks to a new design:
- Users can now set the UTC time at which they desire their reports to be generated and sent
- Our report generation frequency has been made granular to the 30min time-slot (because believe it or not, some time-zones are 30min granular)
- As a by-product, these generated report subscriptions can now be made to match precisely local windows of network activity to the exact desires of our users
- Moving from a single monolithic job in every SaaS cluster, we also added the notion of parallelization: multiple workers would trigger under the same scheduled job that would expedite the generation process
- As a data-point, after migrating all of our users' Report Subscriptions to the new engine we kept the initial time the single job would run at as the configured time for all of them, and instead of lasting 7 hours, parallelization reduced this time to 5min !!!
- As an additional byproduct, reports generation is now much more scalable because it is smoothed throughout the day, but also any failing report does so more gracefully because of 1) parallelization, and 2) we took this opportunity to instrument this new system more thoroughly than the legacy one.
... but wait, there's more
For those keen on the last minute "But wait, there's more..." gimmick made famous by Apple's product keynotes, we figured we may slip in to this vintage a few additional features that our users had also been asking for:
- Ability to set a Dashboard or Saved View lookback window to: "This Month" (i.e. month to date, regardless what day of the month it is)
- Ability to set a Dashboard or Saved View lookback window to: "Last Month"
So, what do you think?
More often than not, some additional capabilities don't make it to the release time. As upsetting as it usually is to the Product Manager working on the feature set, we rely on iterations, and whatever capability doesn't make it right now will eventually find its way into the product in a future release if we hear our customers asking for it. This time around, we wanted to directly implement Time-Zones in the Subscription Target time configuration, but couldn't fit it in our busy schedule. What this means is that users will still incur the one time mental cost of computing the UTC time equivalent to configure a subscription's schedule, but this also means that Schedule times will need to be manually changed twice a year for those users whose country follows Daylight Savings.
Once you get to take this feature for a spin, we'd love to hear your thoughts about where you'd like us to take it in the future, and remember, if history repeats itself, we will probably be evolving Report Subscriptions for another round next summer ;)