Bugzilla – Bug 415
unbound fail to resolve A record of edadfs.partners.extranet.microsoft.com
Last modified: 2011-10-24 15:36:39 CEST
I had problem to get valid resposone with one record at least:
[root@gts-new ~]# host edadfs.partners.extranet.microsoft.com 10.0.0.1
;; connection timed out; no servers could be reached
Unbound work fine of course:
[root@gts-new ~]# host www.nlnetlabs.nl 10.0.0.1
Using domain server:
www.nlnetlabs.nl has address 22.214.171.124
www.nlnetlabs.nl has IPv6 address 2001:7b8:206:1::1
I had no problem when query go to bind dns server (9.8.1):
[root@gts-new ~]# host edadfs.partners.extranet.microsoft.com 127.0.0.1
Using domain server:
edadfs.partners.extranet.microsoft.com has address 126.96.36.199
I try pdns-recursor too - with success - work good.
Info about unbound:
[root@gts-new ~]# /usr/sbin/unbound -h
usage: unbound [options]
start unbound daemon DNS resolver.
-h this help
-c file config file to read instead of /etc/unbound/unbound.conf
file format is described in unbound.conf(5).
-d do not fork into the background.
-v verbose (more times to increase verbosity)
linked libs: libevent 1.4.13-stable (it uses epoll), ldns 1.6.8, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
linked modules: validator iterator
configured for x86_64-redhat-linux-gnu on Tue Oct 11 14:33:36 CEST 2011 with options: '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-ldns=' '--with-libevent' '--with-pthreads' '--with-ssl' '--disable-rpath' '--enable-debug' '--disable-static' '--with-conf-file=/etc/unbound/unbound.conf' '--with-pidfile=/var/run/unbound/unbound.pid' '--enable-sha2' '--disable-gost'
BSD licensed, see LICENSE in source package for details.
Report bugs to firstname.lastname@example.org
Running at RHEL 5.7, Version 1.4.12 is buggy too - no response. Unbound installed from rawhide repo on Fedora 17 (rawhide installation - 1.4.13) fails like above - standard configuration - I dont change anything at unbound.conf.
Okay, so it tries to resolve and hits partners.extranet.microsoft.com,
partners.extranet.microsoft.com. 3600 IN NS dns13.one.microsoft.com.
partners.extranet.microsoft.com. 3600 IN NS dns11.one.microsoft.com.
partners.extranet.microsoft.com. 3600 IN NS dns12.one.microsoft.com.
partners.extranet.microsoft.com. 3600 IN NS dns10.one.microsoft.com.
;; ADDITIONAL SECTION:
dns13.one.microsoft.com. 3600 IN A 188.8.131.52
dns11.one.microsoft.com. 3600 IN A 184.108.40.206
dns12.one.microsoft.com. 3600 IN A 220.127.116.11
dns10.one.microsoft.com. 3600 IN A 18.104.22.168
22.214.171.124 is recursive: not an authoritative server but a cache.
126.96.36.199 is recursive: not an authoritative server but a cache.
188.8.131.52 is lame: it does not serve partners.extranet.microsoft.com.
184.108.40.206 is down: it times out and does not answer.
The first pass does not result in a server that unbound wants to talk to. (it tries 131... a couple times but it keeps timing out). So it attempts to fetch more choices (by searching for other nameservers for this domain). But there is no IPv6 deployed or anything like that.
So it ends up choosing from a list of bad choices. It sends the query to the recursive server (it may be cache poisoned, but its the only way right now). This is then the answer for the A record. It works.
When the next query comes along, however, it wants to avoid the timeout server, but the server selection here has a bug where it picks up this timeout server and rejects this, but rejects the entire query instead of going to the other choices.
Thanks for the report, this particular misconfiguration triggers an interesting codepath, and this is a bug in the server selection.
One workaround on unbound.conf is
that will blacklist that address, which is (probably) harmless but helps server selection workaround this bug by asking the other servers. (Unless that server is the only one to serve other zones (for which it does not timeout)).
If you can influence the servers, please tell the dns operators to fix their servers so they do not deploy caches, but authority servers. That would workaround this bug (if you can influence the authority servers here), i.e. they are now running 'unbound' (or something like it) on those servers, but they should be running 'nsd' (or something like it). (i.e. pdns-recursor instead of pdns, or for bind it is a configuration problem (it needs to be a slave or master zone).
Of course it should be fixed in the code, but the above workarounds may help you on the short term.
The bug has been fixed in svn trunk r2522 of unbound (this is the next release under development). It works for me now, it spends time on the unresponsive server to see if it responds (it is preferred, if only it answered), so it may takes 10-20 seconds for that, but once it has determined that server is down, resolution works for the misconfigured domain.
In general the probes towards the timeout server will slowdown resolution of this domain. This is because we want security: the other options are (cache-poisoned?) caches, and the burden (slow resolution) is shouldered by someone who made a (lot) of mistakes configuring their DNS servers.
Thank you for response and quick fix issue :) I temporary use another workaround:
local-data: "edadfs.partners.extranet.microsoft.com IN A 220.127.116.11"
I have no any special contact with microsoft to post info about bad dns configuration. :)