Bugzilla – Bug 3114
Add total sum to histogram metric / Add integrated support for generating Prometheus metrics ?
Last modified: 2017-12-04 11:42:03 CET
We at Kumina are the maintainers of this piece of software:
It's a tool that connects to Unbound's control socket, sends it the "UBCT1 stats_noreset" command and converts the statistics it receives from Unbound to the format supported by the Prometheus monitoring system:
Though this exporter works pretty well and has a loyal fanbase, there are a couple of things that could be improved:
The command it sends to Unbound, "UBCT1 stats_noreset", is not documented. We only figured out its existence after inspecting unbound-control's source code. We don't want to invoke unbound-control here, as forking/executing may have a significant overhead for some workloads (high-frequency monitoring). "Shelling out" in Prometheus exporters tends to be frowned upon.
The histogram.* metrics don't map cleanly to a Prometheus histogram. In addition to bucketing the number of samples, Prometheus also likes to know the aggregated sum of all of those samples. This can be stored in constant space and has the advantage that it allows for the computation of average request latency over time.
Unbound currently only exports a per-bucket count, not the total sum, meaning that we currently have to guess this value by assuming the samples are uniformly distributed in every bucket.
The reason why I'm getting in touch is because I'd like to explore what we can do to improve these things. In the short term, we could document "UBCT1 stats_noreset" and extend the Unbound code to also export an aggregated sum of the latency histogram.
In the long term, maybe it makes sense to follow a more radical approach. In my opinion, Prometheus is the only monitoring system that has a sane, extensible, yet simple and well documented format for how metrics should be exported:
Maybe Unbound could gain integrated support for exposing stats in Prometheus' format, eventually removing the need for our separate exporter (and "UBCT1 stats_noreset") entirely? This has the advantage that we no longer need to spend any effort to keep the exporter in sync with the implementation.
Finally, I'd like to mention that if integrated supporting Prometheus' format is out of the question, we at Kumina are open to transferring copyright of the exporter to NLnet Labs, so that it may become part of the Unbound source distribution.
Thank you for your message on the Prometheus monitoring system and the integration with Unbound.
We will discuss the options you mentioned at NLnet Labs. That is, either export statistics via Unbound control or another (integrated) manner, and in a format that can be used by other open source software.
An option is to include the integrated code in the Unbound contrib section, but we will need to discuss this. We do think the integration with a system like Prometheus is very useful and some code to facilitate this is valuable. So thanks.
What do you mean with the total sum for a bucket, what value is it that you want?
UBCT1 stats_noreset is already documented in the doc/control_proto_spec.txt file.
UBCT1 is a version-numbered unique prefix for the commandline input that follows.
It is also possible, and documented, so get statistics via shared memory, stats_shm is the command to unbound-control, and unbound.h exports the definitions that could be used by other programs as well. This has significantly lower CPU requirements than the TLS connection. And it is another way to integrate with unbound's statistics.
Best regards, Wouter
(In reply to Wouter Wijngaards from comment #2)
> What do you mean with the total sum for a bucket, what value is it that you
The sum of all of the individual sample values. For example, if there are 10 samples of 1 second and 7 of 3 seconds, then the sum of the sample values would be 10 * 1 + 7 * 3 = 31.
Prometheus uses this, because it samples the output regularly. By computing the rate/derivative of both the sum of all the samples and their count, one can obtain the average query latency. The histogram buckets only give insight in quantiles/percentiles; not the average within a timeframe.
> UBCT1 stats_noreset is already documented in the doc/control_proto_spec.txt
> UBCT1 is a version-numbered unique prefix for the commandline input that
Oh, wow. I overlooked that. I only browsed through the man pages. Thanks!
> It is also possible, and documented, so get statistics via shared memory,
> stats_shm is the command to unbound-control, and unbound.h exports the
> definitions that could be used by other programs as well. This has
> significantly lower CPU requirements than the TLS connection. And it is
> another way to integrate with unbound's statistics.
Awesome! Filed a tracking bug for that on the exporter's side:
Nice to hear!
The average is printed as recursion.time.avg. Not per histogram bucket, but per thread and globally. And that is the average query latency. This sounds like it is the value you are asking for (all wait times summed, divided by number of items)?
Best regards, Wouter
(In reply to Wouter Wijngaards from comment #4)
> The average is printed as recursion.time.avg. Not per histogram bucket, but
> per thread and globally. And that is the average query latency. This
> sounds like it is the value you are asking for (all wait times summed,
> divided by number of items)?
That does indeed sound useful. I wasn't sure whether total.recursion.time.avg measured the same thing as the histogram. Thanks for confirming. I've just changed unbound_exporter to make use of it:
One thing that we do need to keep an eye out for is whether this approach still guarantees monotonicity. Due to rounding and integer division, it may be the case that avg * count drops in value. That said, let's ignore that for now.
Good to hear!
The avg is calculated by keeping track of the sum. It is divided by num.recursivereplies for printing. So avg*count shouldn't drop in value...
Best regards, Wouter
The entire stack (Unbound + unbound_exporter) now compute this value:
x = avg * count = timeval_divide(sum, count) * count
As timeval_divide() rounds down its result, it may be the case that an increase of 'count' may yield a smaller total sum.
Yes you are right, it could do that; if that is really a problem for you it is easy enough to printout the timeval sum. Timeval divide also keeps track of the timeval.usec, so the error is likely some number of microseconds.
Best regards, Wouter