Bug 1144 - Unbound could not be stopped correctly with enabled DNS TAP socket
Unbound could not be stopped correctly with enabled DNS TAP socket
Status: ASSIGNED
Product: unbound
Classification: Unclassified
Component: server
1.5.10
x86_64 Linux
: P5 major
Assigned To: unbound team
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-10-27 15:58 CEST by Pavel Odintsov
Modified: 2016-11-16 17:02 CET (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel Odintsov 2016-10-27 15:58:24 CEST
Hello!

I'm testing DNS Tap capability with following configuration:
server:
    interface: 0.0.0.0
    port: 1153
    pidfile: "/var/run/unbound/unbound.pid"
    chroot: ""
    log-queries: yes
    username: root
    do-ip6: no
    logfile: "/var/log/unbound.log"
    verbosity: 10
dnstap:
    dnstap-enable: yes
    dnstap-socket-path: "/tmp/dnstap.sock"

    dnstap-send-identity: no
    dnstap-send-version: no

    dnstap-log-client-query-messages: yes
    dnstap-log-client-response-messages: yes
    dnstap-log-resolver-query-messages: yes
    dnstap-log-resolver-response-messages: yes
    dnstap-log-forwarder-query-messages: yes
    dnstap-log-forwarder-response-messages: yes

And I'm using this dnstap client: https://github.com/dnstap/dnstap-ldns 

Reproduce list:
1) sudo dnstap -u /tmp/dnstap.sock
2) Start unbound in foreground: sudo /usr/sbin/unbound -c /home/vagrant/unbound_dnstap.conf -d
3) Execute some query to this Unbound instance
4) Try to shutdown Unbound instance correctly with CTRL+C. It handled up until you stopped DNS TAP client.

Thanks!
Comment 1 Wouter Wijngaards 2016-10-27 16:57:33 CEST
Hi Pavel,

That exactly went wrong?  If you start unbound with -dd it logs to stderr on the console and with -vvvvv it enables maximum verbosity.

What does unbound printout when it goes wrong, and what do you mean with "It handled up until you stopped DNS TAP client.".  What handled and what happened when dns tap was stopped?

Best regards, Wouter
Comment 2 Pavel Odintsov 2016-10-27 17:27:57 CEST
Hello, Wouter!

Expected behaviour is "correct Unbound shutdown".

But I see following.

I started up DNS TAP client tool:
sudo dnstap -u /tmp/dnstap.sock
dnstap: opened input socket /tmp/dnstap.sock

dnstap.FrameStreamSockInput: accepted a socket connection

Then I started Unbound:
sudo /usr/sbin/unbound -c /home/vagrant/unbound_dnstap.conf -d -vvvvv

Then I tried to shutdown it with CTRL+C:
[1477581953] unbound[5983:0] notice: Start of unbound 1.5.11.
[1477581953] unbound[5983:0] debug: creating udp4 socket 0.0.0.0 1153
[1477581953] unbound[5983:0] debug: creating tcp4 socket 0.0.0.0 1153
[1477581953] unbound[5983:0] error: cannot open pidfile /var/run/unbound/unbound.pid: No such file or directory
[1477581953] unbound[5983:0] debug: cannot chown 0.0 /var/run/unbound/unbound.pid: No such file or directory
[1477581953] unbound[5983:0] debug: chdir to /etc/unbound
[1477581953] unbound[5983:0] debug: drop user privileges, run as root
[1477581953] unbound[5983:0] debug: switching log to /var/log/unbound.log


^C^C^C^C^C

Here you could find log content for this time:
[1477581953] unbound[5983:0] info: start of service (unbound 1.5.11).

[1477581961] unbound[5983:0] info: service stopped (unbound 1.5.11).
[1477581961] unbound[5983:0] debug: stop threads
[1477581961] unbound[5983:0] debug: cleanup.
[1477581961] unbound[5983:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch
[1477581961] unbound[5983:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
[1477581961] unbound[5983:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
[1477581961] unbound[5983:0] debug: cache memory msg=66072 rrset=66072 infra=2632 val=66344
[1477581961] unbound[5983:0] notice: closing dnstap socket

But Unbound daemon is not stopped. There is only one way to stop it correctly: shutdown DNS TAP client tool. 

Then I could see:
[1477582063] unbound[5983:0] debug: Exit cleanup.
[1477582063] unbound[5983:0] debug: switching log to stderr
Comment 3 Wouter Wijngaards 2016-10-27 17:47:35 CEST
Hi Pavel,

At this point unbound simply calls:
        fstrm_iothr_destroy(&env->iothr);
And this is taking time.  It seems to be inside the fstrm library (from dnstap).  Not sure how to fix that?

Best regards, Wouter
Comment 4 Wouter Wijngaards 2016-10-27 17:49:52 CEST
Hi Pavel,

It's documentation says:
 * Destroy an `fstrm_iothr` object. This signals the background I/O thread to
 * flush or discard any queued data frames and deallocates any resources used
 * internally. This function blocks until the I/O thread has terminated.
So I guess the I/O thread has not terminated.  Not sure why?  Are the contents not properly drained and it is waiting for some last packets to be sent?  Otherwise, it doesn't seem to be an unbound bug.

But clearly, undesired behaviour.  Not sure what to do about it from unbound's code.

Best regards, Wouter
Comment 5 Robert Edmonds 2016-10-27 20:29:53 CEST
Hi, Pavel:

This is definitely a problem in fstrm, not in Unbound. I would recommend opening an issue in the fstrm issue tracker: https://github.com/farsightsec/fstrm/issues.
Comment 6 Pavel Odintsov 2016-11-02 15:10:34 CET
Thanks for attention! I'm going to create issue for fstrm project.
Comment 7 Pavel Odintsov 2016-11-16 17:02:43 CET
Filled bug to fstrm project: https://github.com/farsightsec/fstrm/issues/27