Bugzilla – Bug 689
unbound-anchor hangs for 50 minutes
Last modified: 2015-07-20 14:16:57 CEST
We have a bug in Fedora  with user claiming that the installation of unbound-libs package, during which we are calling unbound-anchor to update the root key, hangs for 50 minutes. Even though I'm not able to reproduce it, the user is. I asked him to provide backtrace from the process when it hangs:
unbound-anchor -a /var/lib/unbound/root.anchor -c /etc/unbound/icannbundle.pem
Which also hanged.
I got this bt each time with gdb:
#0 0x00003fff8e0a4958 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x00003fff8e76a824 in 00000043.plt_call.bufferevent_disable () from /lib64/libevent-2.0.so.5
#2 0x00003fff8e74b61c in .event_base_loop () from /lib64/libevent-2.0.so.5
#3 0x00003fff8e74cec4 in .event_base_dispatch () from /lib64/libevent-2.0.so.5
#4 0x00003fff8e8eaa6c in 0000106d.plt_call.PyTuple_New () from /lib64/libunbound.so.2
#5 0x00003fff8e965fd4 in .ub_resolve () from /lib64/libunbound.so.2
#6 0x0000000045d746b0 in 000002e1.plt_call.ub_ctx_create ()
#7 0x00003fff8dfa438c in .generic_start_main.isra () from /lib64/libc.so.6
#8 0x00003fff8dfa45b4 in .__libc_start_main () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
unbound-anchor is a C binary. Therefore it is strange that there is the "PyTuple_New" call, since it is part of the Python bindings in libunbound.
The machine has Internet access.
Do you as the upstream have any idea why and how could unbound-anchor hang for such a long time and in the end ends successfully?
Thanks in advance.
It should not hang. The python call, if unbound is reading a config file (with -C option perhaps?) and that config contains a python section would be loaded and processed.
But that does not explain why it would hang. In ub_resolve, so it is trying to resolve data.iana.org (but why, because mostly it would try to get the . DNSKEY). So it must be trying to do a . DNSKEY query.
Does the user have options like ssl-upstream set that the unbound-anchor gets via a config file in some way?
I would like to be able to reproduce it, I wonder what external cause makes it behave in this way.
This is happening during the installation. This means that there is no unbound.conf installed, but even it if was, the ssl-upstream is not set, so should use default value (which seems to be "no").
I was thinking that maybe the users resolver from DHCP is not able to provide DNSSEC data and libunbound is trying to do the validation.
I could ask for the communication packet dump. If you have any ideas what would be helpful, I can ask for it.
Yes, I would think timeouts would be fast for that. Indeed, somehow the network is down or non-responsive or something?
You could, you know, just not call unbound-anchor during installation. In the initial Makefiles I called it because I did not know how else to fetch a root anchor for the user.
OTOH it should work. I am interested in that packet trace. Is there something else that is out of the ordinary during installation. Are there SELinux denied calls?
Best regards, Wouter