Bug 558 - failed prefetch stops serving cached content
failed prefetch stops serving cached content
Status: RESOLVED FIXED
Product: unbound
Classification: Unclassified
Component: server
1.4.21
Other Linux
: P5 normal
Assigned To: unbound team
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-02-26 11:43 CET by Justin
Modified: 2014-05-28 14:14 CEST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Justin 2014-02-26 11:43:35 CET
I'm looking into the prefetch functionality in Unbound 1.4.21 (could not select this version on submitting the bug report)

Using the following configuration as a local forwarding/caching server:

server:
    prefetch: yes
    prefetch-key: yes
    forward-zone:
        name: "domain.eu"
        forward-addr: 221.221.221.221


By default unbound tries to prefetch when there is only 10% left of the TTL of an cached object.
This works fine.

But I discovered that when the authoritative or forwarded server is not reachable the unbound server will stop serving the cached object after prefetch failed.
You would expect the cache to be valid till TTL drops to 0.

To make it easier to debug I recompiled Unbound with a 50% threshold for prefetch.

util/data/msgreply.h: #define PREFETCH_TTL_CALC(ttl) ((ttl) - (ttl)/2)

Here the expect behaviour, after 30 seconds a successful prefetch takes place:

via forwarder     : forwardertest.domain.eu. 35	IN	A	123.123.123.111
from authoritative: forwardertest.domain.eu. 60	IN	A	123.123.123.111

via forwarder     : forwardertest.domain.eu. 34	IN	A	123.123.123.111
from authoritative: forwardertest.domain.eu. 60	IN	A	123.123.123.111

via forwarder     : forwardertest.domain.eu. 33	IN	A	123.123.123.111
from authoritative: forwardertest.domain.eu. 60	IN	A	123.123.123.111

via forwarder     : forwardertest.domain.eu. 31	IN	A	123.123.123.111
from authoritative: forwardertest.domain.eu. 60	IN	A	123.123.123.111

via forwarder     : forwardertest.domain.eu. 30	IN	A	123.123.123.111
from authoritative: forwardertest.domain.eu. 60	IN	A	123.123.123.111

via forwarder     : forwardertest.domain.eu. 59	IN	A	123.123.123.111
from authoritative: forwardertest.domain.eu. 60	IN	A	123.123.123.111

via forwarder     : forwardertest.domain.eu. 58	IN	A	123.123.123.111
from authoritative: forwardertest.domain.eu. 60	IN	A	123.123.123.111

via forwarder     : forwardertest.domain.eu. 56	IN	A	123.123.123.111
from authoritative: forwardertest.domain.eu. 60	IN	A	123.123.123.111

via forwarder     : forwardertest.domain.eu. 55	IN	A	123.123.123.111
from authoritative: forwardertest.domain.eu. 60	IN	A	123.123.123.111

But If I make the authoritative server unreachable (reject by iptables) after the object has been cache by Unbound, the following happens:

via forwarder     : forwardertest.domain.eu. 38	IN	A	123.123.123.111
from authoritative: unreachable/no answer

via forwarder     : forwardertest.domain.eu. 36	IN	A	123.123.123.111
from authoritative: unreachable/no answer

via forwarder     : forwardertest.domain.eu. 34	IN	A	123.123.123.111
from authoritative: unreachable/no answer

via forwarder     : forwardertest.domain.eu. 32	IN	A	123.123.123.111
from authoritative: unreachable/no answer

via forwarder     : forwardertest.domain.eu. 30	IN	A	123.123.123.111
from authoritative: unreachable/no answer

via forwarder     : unreachable/no answer
from authoritative: unreachable/no answer

via forwarder     : unreachable/no answer
from authoritative: unreachable/no answer

in the logs I see:

Feb 26 10:59:23 localhost unbound: [7175:9] info: error sending query to auth server 221.221.221.221 port 53
Feb 26 10:59:23 localhost unbound: [7175:9] info: processQueryTargets: forwardertest.domain.eu. A IN
Feb 26 10:59:23 localhost unbound: [7175:9] info: sending query: forwardertest.domain.eu. A IN
Feb 26 10:59:23 localhost unbound: [7175:9] notice: sendto failed: Operation not permitted
Feb 26 10:59:23 localhost unbound: [7175:9] notice: remote address is 221.221.221.221 port 53

I' a correct to assume this is a bug?

Gr, Justin
Comment 1 Wouter Wijngaards 2014-02-26 14:13:18 CET
Hi Justin,

Yes this is unwanted.  However, just to not-cache it would give an endless repeat of attempts while the servers are unresponsive.  This should also be rate-limited somehow, otherwise it'll hammer endlessly in last 10% of ttl.  Thanks for the test results as well.  This is something I would want to fix (although I think it is not actually a flaw in the code but an omission).

Best regards,
   Wouter
Comment 2 Justin 2014-03-19 17:47:37 CET
Hi,

Has it already been added to the roadmap?

Please le me know when you have something you want to have tested.

Gr, Justin
Comment 3 Wouter Wijngaards 2014-04-10 15:59:13 CEST
Hi Justin,

The bug has been fixed in svn trunk.  It is fairly tricky, but a small fix; instead of caching the SERVFAIL it delays the prefetch fetch for a couple of seconds.  Then it will attempt again.  Meanwhile, it still has the cached entry (that slowly expires) that it serves to customers.

You could use the svn trunk of unbound if you want to test this change.  svn co http://unbound.net/svn/trunk fetches the svn trunk source.

This must have been pretty hard to spot this problem, thank you very much for reporting it. :-)

Best regards,
   Wouter
Comment 4 Justin 2014-05-28 14:14:43 CEST
Hi,

Just tested trunk r3140.
Patch seems to work as expected while repeating the tests.
Thank you very much.

Gr, Justin