Bug 452

Summary: Service Crashed
Product: unbound Reporter: RickardW <rickard.welin>
Component: serverAssignee: unbound team <unbound-team>
Severity: major CC: hb, jiri.lunacek, wouter
Priority: P5    
Version: 1.4.17   
Hardware: i386   
OS: Windows   

Description RickardW 2012-06-11 10:58:43 CEST
Log Name:      Application
Source:        unbound
Date:          2012-06-11 10:14:18
Event ID:      4
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      XXXXXX
[unbound:0] fatal error: services/mesh.c:742: mesh_state_attachment: assertion n != NULL failed
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
    <Provider Name="unbound" />
    <EventID Qualifiers="57345">4</EventID>
    <TimeCreated SystemTime="2012-06-11T08:14:18.000000000Z" />
    <Security />
    <Data>[unbound:0] fatal error: services/mesh.c:742: mesh_state_attachment: assertion n != NULL failed</Data>
Comment 1 Wouter Wijngaards 2012-06-11 12:58:57 CEST
Hi Rickard,

This means that it tried at attach a subquery to a query which already has this as a subquery; which should not happen.  An internal error.

Do you have information what was going on at the time?  A particular domain or set of domains that are looked up?  What sort of traffic do you have?

This was unbound-1.4.17?

Best regards,
Comment 2 RickardW 2012-06-11 14:15:24 CEST
it's hard to tell which query that caused it since we are running against 70-100 clients in our domain with Windows and Mac clients.
Can't see what was going on at that time.

It is unbound-1.4.17.

Comment 3 Wouter Wijngaards 2012-06-11 14:18:27 CEST
Hi Rickard,

I have fixed the issue for you, it need not crash at this point.  There is a
snapshot (passes regression tests, but is a snapshot of development, not an
official release) available here, you can use that to prevent the bug from
happening again:

But I would very much like to know how this happened.  Did you configure
unbound specially (with harden-referral-path?).  Do you run special modules
(python?) in unbound?  Particular domains that are looked up?

Best regards,
Comment 4 Wouter Wijngaards 2012-06-11 14:19:54 CEST
Hi Rickard,

Our comments crossed at the same time, I see you do not know the query names.  Do you have configuration in unbound.conf for stub-zone or forward-zones?  Do those use nameserver-names (not IP-addresses) ?

Best regards,
Comment 5 RickardW 2012-06-13 13:04:38 CEST
nothing special in the conf file as far as I know.

here is most of it.

server: auto-trust-anchor-file: "C:\Program Files (x86)\Unbound\root.key"
interface: ::0
access-control: allow
#ignore-cd-flag: yes

I'll install the snapshot release you provided and test.

Comment 6 Wouter Wijngaards 2012-07-20 15:09:48 CEST
Hi RickardW,

After a report from someone else, I have found the root cause for this issue.  And also another symptom (an assertion failure like you experienced, in mesh.c). The fix is in svn trunk, I can send you a windows executable if you like.

Best regards,
Comment 7 jiri.lunacek 2012-07-23 10:30:57 CEST

We are encountering the same issue on CentOS:
Jul 23 00:21:21 dnscache1 unbound: [1110:0] fatal error: services/mesh.c:742: mesh_state_attachment: assertion n != NULL failed

Apparently this usually happens when unbound reaches it's cache size limits. Howerver, I cannot say for sure if that has any impact on this issue.

I would really appreciate a tarball or binary with this problem fixed. Since our recursors are under heavy load replacing unbound with bind is not an option. We ended up setting up a watchdog and restarting unboud should it crash again.

CentOS release 6.3 (Final)
kernel: 2.6.32-279.2.1.el6.x86_64
Version 1.4.17
linked libs: libevent 1.4.13-stable (it uses epoll), ldns 1.6.13, OpenSSL 1.0.0-fips 29 Mar 2010
linked modules: validator iterator
configured for x86_64-redhat-linux-gnu on Wed Jun  6 09:47:53 NOVT 2012 with options: '--build=x86_64-unknown-linux-gnu' '--host=x86_64-unknown-linux-gnu' '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-ldns=' '--with-libevent' '--with-pthreads' '--with-ssl' '--disable-rpath' '--enable-debug' '--disable-static' '--with-conf-file=/etc/unbound/unbound.conf' '--with-pidfile=/var/run/unbound/unbound.pid' '--enable-sha2' '--disable-gost' '--disable-ecdsa'
Comment 8 Wouter Wijngaards 2012-07-23 10:34:22 CEST
Hi Jiri,

I have send you a preview of the 1.4.18 release, a snapshot of today's svn version.

Best regards,
Comment 9 jiri.lunacek 2012-07-23 11:08:58 CEST
(In reply to comment #8)

Hi. Thank you for the snapshot. I have built it and deployed on one of our recursors. I will report back should we have any more problems with this version.

I am looking forward to the new release.

Comment 10 Henrik Bro 2012-07-23 17:47:49 CEST
(In reply to comment #8)
> Hi Jiri,
> I have send you a preview of the 1.4.18 release, a snapshot of today's svn
> version.
> Best regards,
>    Wouter

I have got the same error - when do you think  1.4.18 will be released?

Best regards,
Comment 11 Wouter Wijngaards 2012-07-24 09:08:08 CEST
This week, or in one week.

Best regards,