Bug 3114 - Add total sum to histogram metric / Add integrated support for generating Prometheus metrics ?
Add total sum to histogram metric / Add integrated support for generating Pro...
Status: ASSIGNED
Product: unbound
Classification: Unclassified
Component: server
unspecified
Other All
: P5 enhancement
Assigned To: unbound team
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-11-21 10:05 CET by Ed Schouten
Modified: 2017-12-04 11:42 CET (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ed Schouten 2017-11-21 10:05:59 CET
Hi there,

We at Kumina are the maintainers of this piece of software:

https://github.com/kumina/unbound_exporter

It's a tool that connects to Unbound's control socket, sends it the "UBCT1 stats_noreset" command and converts the statistics it receives from Unbound to the format supported by the Prometheus monitoring system:

https://prometheus.io/

Though this exporter works pretty well and has a loyal fanbase, there are a couple of things that could be improved:

--------

- https://github.com/kumina/unbound_exporter/blob/master/unbound_exporter.go#L314-L324

The command it sends to Unbound, "UBCT1 stats_noreset", is not documented. We only figured out its existence after inspecting unbound-control's source code. We don't want to invoke unbound-control here, as forking/executing may have a significant overhead for some workloads (high-frequency monitoring). "Shelling out" in Prometheus exporters tends to be frowned upon.

- https://github.com/kumina/unbound_exporter/blob/master/unbound_exporter.go#L284-L286

The histogram.* metrics don't map cleanly to a Prometheus histogram. In addition to bucketing the number of samples, Prometheus also likes to know the aggregated sum of all of those samples. This can be stored in constant space and has the advantage that it allows for the computation of average request latency over time.

Unbound currently only exports a per-bucket count, not the total sum, meaning that we currently have to guess this value by assuming the samples are uniformly distributed in every bucket.

--------

The reason why I'm getting in touch is because I'd like to explore what we can do to improve these things. In the short term, we could document "UBCT1 stats_noreset" and extend the Unbound code to also export an aggregated sum of the latency histogram.

In the long term, maybe it makes sense to follow a more radical approach. In my opinion, Prometheus is the only monitoring system that has a sane, extensible, yet simple and well documented format for how metrics should be exported:

https://prometheus.io/docs/instrumenting/exposition_formats/

Maybe Unbound could gain integrated support for exposing stats in Prometheus' format, eventually removing the need for our separate exporter (and "UBCT1 stats_noreset") entirely? This has the advantage that we no longer need to spend any effort to keep the exporter in sync with the implementation.

Finally, I'd like to mention that if integrated supporting Prometheus' format is out of the question, we at Kumina are open to transferring copyright of the exporter to NLnet Labs, so that it may become part of the Unbound source distribution.
Comment 1 Benno Overeinder 2017-11-21 16:21:00 CET
Hi Ed,

Thank you for your message on the Prometheus monitoring system and the integration with Unbound.  

We will discuss the options you mentioned at NLnet Labs.  That is, either export statistics via Unbound control or another (integrated) manner, and in a format that can be used by other open source software.

An option is to include the integrated code in the Unbound contrib section, but we will need to discuss this.  We do think the integration with a system like Prometheus is very useful and some code to facilitate this is valuable.  So thanks.

Best regards,

-- Benno
Comment 2 Wouter Wijngaards 2017-11-28 14:59:17 CET
Hi,

What do you mean with the total sum for a bucket, what value is it that you want?

UBCT1 stats_noreset is already documented in the doc/control_proto_spec.txt file.
UBCT1 is a version-numbered unique prefix for the commandline input that follows.

It is also possible, and documented, so get statistics via shared memory, stats_shm is the command to unbound-control, and unbound.h exports the definitions that could be used by other programs as well.  This has significantly lower CPU requirements than the TLS connection.  And it is another way to integrate with unbound's statistics.

Best regards, Wouter
Comment 3 Ed Schouten 2017-11-28 16:14:39 CET
(In reply to Wouter Wijngaards from comment #2)
> What do you mean with the total sum for a bucket, what value is it that you
> want?

The sum of all of the individual sample values. For example, if there are 10 samples of 1 second and 7 of 3 seconds, then the sum of the sample values would be 10 * 1 + 7 * 3 = 31.

Prometheus uses this, because it samples the output regularly. By computing the rate/derivative of both the sum of all the samples and their count, one can obtain the average query latency. The histogram buckets only give insight in quantiles/percentiles; not the average within a timeframe.

> UBCT1 stats_noreset is already documented in the doc/control_proto_spec.txt
> file.
> UBCT1 is a version-numbered unique prefix for the commandline input that
> follows.

Oh, wow. I overlooked that. I only browsed through the man pages. Thanks!

> It is also possible, and documented, so get statistics via shared memory,
> stats_shm is the command to unbound-control, and unbound.h exports the
> definitions that could be used by other programs as well.  This has
> significantly lower CPU requirements than the TLS connection.  And it is
> another way to integrate with unbound's statistics.

Awesome! Filed a tracking bug for that on the exporter's side:

https://github.com/kumina/unbound_exporter/issues/7
Comment 4 Wouter Wijngaards 2017-11-28 16:34:51 CET
Hi Ed,

Nice to hear!

The average is printed as recursion.time.avg.  Not per histogram bucket, but per thread and globally.  And that is the average query latency.  This sounds like it is the value you are asking for (all wait times summed, divided by number of items)?

Best regards, Wouter
Comment 5 Ed Schouten 2017-12-04 10:42:41 CET
Hi Wouter,

(In reply to Wouter Wijngaards from comment #4)
> The average is printed as recursion.time.avg.  Not per histogram bucket, but
> per thread and globally.  And that is the average query latency.  This
> sounds like it is the value you are asking for (all wait times summed,
> divided by number of items)?

That does indeed sound useful. I wasn't sure whether total.recursion.time.avg measured the same thing as the histogram. Thanks for confirming. I've just changed unbound_exporter to make use of it:

https://github.com/kumina/unbound_exporter/commit/4f36729f553665a4268b5c265448977276a95096

One thing that we do need to keep an eye out for is whether this approach still guarantees monotonicity. Due to rounding and integer division, it may be the case that avg * count drops in value. That said, let's ignore that for now.
Comment 6 Wouter Wijngaards 2017-12-04 10:56:18 CET
Hi Ed,

Good to hear!

The avg is calculated by keeping track of the sum.  It is divided by num.recursivereplies for printing.  So avg*count shouldn't drop in value...

Best regards, Wouter
Comment 7 Ed Schouten 2017-12-04 11:29:26 CET
The entire stack (Unbound + unbound_exporter) now compute this value:

x = avg * count = timeval_divide(sum, count) * count

As timeval_divide() rounds down its result, it may be the case that an increase of 'count' may yield a smaller total sum.
Comment 8 Wouter Wijngaards 2017-12-04 11:42:03 CET
Hi Ed,

Yes you are right, it could do that; if that is really a problem for you it is easy enough to printout the timeval sum.  Timeval divide also keeps track of the timeval.usec, so the error is likely some number of microseconds.

Best regards, Wouter