Bugzilla – Bug 416
Unbound's way of storing host-down cache can be exploited for a partial DOS behavior
Last modified: 2011-10-27 10:34:14 CEST
Well formed resolving requests sent at regular intervals from the internal network against the DNS resolver (unbound server) can lead to a complete loss of all the complete domains hosted by a particular registrar. Those domains become unavailable for the customers depending on that caching (unbound) server.
Steps to Reproduce
Here is a receipe to reproduce the bug on any unbound server (impossible to reproduce on bind or tinydns).
1) write a script that asks your resolving unbound server the following request (yes it is a wrong zone, but unbound should continue to work normally, and this should not be of an issue with side affects as shown below)
while [ 1 ]; do dig @myserver www.coolbox.be; sleep 10; done &
2) try to resolve for example the following domains against @myserver
for domain in e-zone.fr hopitalerasme.eu eguissard.be estates.lu; do dig @myserver mx $domain; done
that should work. You get a NOERROR and ANSWER records back.
3) now wait a couple of minutes while keeping the loop with coolbox running.
4) If you redo the previous check, it will work because the answers are still cached.
5) So in the meantime let us try different domains to check as long as the cache is not yet emptied:
for domain in cypres.com actito.com; do dig @myserver www.$domain; done
This does not work any more.
Actual result: SERVFAIL
Expectet result: NOERROR and ANSWER records.
Once the previous results (step 2) flush out from the cache, the previous checks do not work in 100% of the time neither. Sometimes some of them may work until unbound's cache is filled up again.
Now all domains from a particular Registrar are unreachable. This is a sort of DOS for the customers depending on the Unbound's reponse, isn't it? An answer is available, but the server cannot find it.
Additional Information and analysis:
As the authoritative nameserver does not respond on requests for coolbox.be, unbound builds up a negative cache for two complete nameservers.
unbound-control dump_infra | grep -E "18.104.22.168|22.214.171.124"
126.96.36.199 ttl 286 ping 0 var 94 rtt 376 rto 120000 ednsknown 0 edns 0 delay 0
188.8.131.52 ttl 291 ping 0 var 94 rtt 376 rto 120000 ednsknown 0 edns 0 delay 0
This negative cache is nearly permanent because of the infinite loop. Subsequents requests for domains hosted on the same nameservers (requests for which an answer is available) are untreated by unbound because it has blacklisted the whole servers. This is bad behavior (see our example) and is against the RFC2308 which states that negative cache entries must cover the tuple <query name, type class, server IP address> and not - as it seems to be the case here - only the <server IP address>.
Build Date & Platform
unbound-1.4.8-1.el6.x86_64 on Redhat Linux Linux 5.x
This is because unbound caches downtime per IP adres now.
The problem is that the registrar drops queries for not-configured domains. He should send servfail or upwards delegations. The timeout is interpreted as a down machine by unbound.
I saw your email to the unbound-users before. The fix may be to store downtime per IP address and delegation point.
The RFC you quote and "negative caching" are not really appropriate, they deal with NXDOMAIN and NODATA answers, this is about cache of host downtime.
Nevertheless, we want unbound to work with the internet out there. Some authority server shows this bad behaviour (dropping the incoming query forces the sender into a timeout, which slows it down anyhow) and we want to be able to resolve them even if they have lame zones (i.e. someone points to them for a zone but they do not host it).
In the svn trunk of unbound there is a fixed version. If you try the setup that you list, and the unbound-control dump_infra | sort you can see that the IP address for register.be has different entries, one that is blacklisted, and others that work fine.