Bug 321 - unbound hangs up with 100% cpu usage - resolves after restart
unbound hangs up with 100% cpu usage - resolves after restart
Status: RESOLVED FIXED
Product: unbound
Classification: Unclassified
Component: server
1.4.6
i386 Linux
: P2 normal
Assigned To: unbound team
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-12 19:05 CEST by Günter Grodotzki
Modified: 2010-08-19 15:01 CEST (History)
1 user (show)

See Also:


Attachments
my unbound config (647 bytes, text/plain)
2010-08-12 19:06 CEST, Günter Grodotzki
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Günter Grodotzki 2010-08-12 19:05:20 CEST
unbound continually hangs up causing process to use 100% cpu.

after killing and restarting unbound, it works again. this makes me think this is neither a server resource problem nor an attack.

via iftop / netstat I do not see more traffic or connections.

Did not have this problem with prior versions but started with 1.4.5 and unfortunately did not resolve with 1.4.6


anybody having similar problems? config via attachement - thank you
Comment 1 Günter Grodotzki 2010-08-12 19:06:32 CEST
Created attachment 141 [details]
my unbound config
Comment 2 Wouter Wijngaards 2010-08-13 08:56:24 CEST
Could you test if this is because of your use-caps-for-id setting?  This had code fixed in 1.4.5 ...
Also, if it hangs up again, could you do a gcore <procesid> <file> and then with gdb /usr/local/sbin/unbound <file>
gdb> bt
and send that information to me?

You may have to compile --with-debug to get more debug symbols if you distro 'strips' it or install the debuginfo package to get debug symbols for gdb.  (
debuginfo-install unbound for fedora for example).

Best regards,   Wouter
Comment 3 Wouter Wijngaards 2010-08-16 16:18:04 CEST
Have you been able to get more information, Gunter?

Best regards,   Wouter
Comment 4 Günter Grodotzki 2010-08-17 01:24:09 CEST
thanks for the help. I will do and report here as soon as it happens again.

I'm presuming as this does not happen after x-days/x-queries this is some fault caused be a certain query?

and true, I also used use-caps-for-id only since 1.4.5
Comment 5 Günter Grodotzki 2010-08-18 18:16:35 CEST
i have no idea what I just did.. hope it helps?

#0  0xb7c2bff4 in __ctype_tolower_loc () from /lib/i686/cmov/libc.so.6
(gdb) bt
#0  0xb7c2bff4 in __ctype_tolower_loc () from /lib/i686/cmov/libc.so.6
#1  0x080a81ff in query_dname_compare (d1=0xd53b664 "4\005x3839\005x3840\002rs\004ripe\003net", d2=0x15b6342c "8\005x3839\005x3840\002rs\004ripe\003net")
    at /usr/include/ctype.h:193
#2  0x08082b0c in delegpt_find_ns (dp=0x15b61cb0, name=0xd53b660 "\004ns84\005x3839\005x3840\002rs\004ripe\003net", namelen=30) at iterator/iter_delegpt.c:131
#3  0x08082bf6 in delegpt_add_target (dp=0x15b61cb0, region=0x15b61b20, name=0xd53b660 "\004ns84\005x3839\005x3840\002rs\004ripe\003net", namelen=30, addr=0xbf86d76c, 
    addrlen=16, bogus=0, lame=0, nodup=1) at iterator/iter_delegpt.c:159
#4  0x08082e72 in delegpt_add_rrset_A (dp=0x15b61cb0, region=0x15b61b20, ak=0x10c4f160, lame=0, nodup=1) at iterator/iter_delegpt.c:410
#5  0x0808eace in cache_fill_missing (env=0xa0d9e4c, qclass=1, region=0x15b61b20, dp=0x15b61cb0) at services/cache/dns.c:233
#6  0x0808c669 in iter_handle (qstate=0x15b61b60, iq=0x15b61c38, ie=0xa11f108, id=1) at iterator/iterator.c:1410
#7  0x0808cf50 in iter_operate (qstate=0x15b61b60, event=module_event_pass, id=1, outbound=0x0) at iterator/iterator.c:2623
#8  0x08095b1e in mesh_run (mesh=0xa86a450, mstate=0x15b61b38, ev=module_event_pass, e=0x146e6d70) at services/mesh.c:1004
#9  0x08095e04 in mesh_report_reply (mesh=0xa86a450, e=0x146e6d70, reply=0xbf86dbac, what=-1210720268) at services/mesh.c:462
#10 0x080822fc in worker_handle_service_reply (c=0xa6800e8, arg=0x146e6d70, error=0, reply_info=0xbf86dbac) at daemon/worker.c:281
#11 0x08098c01 in serviced_callbacks (sq=0x13714588, error=0, c=0xa6800e8, rep=0xbf86dbac) at services/outside_network.c:1385
#12 0x08099cf7 in serviced_udp_callback (c=0xa6800e8, arg=0x13714588, error=0, rep=0xbf86dbac) at services/outside_network.c:1598
#13 0x0809acdb in outnet_udp_cb (c=0xa6800e8, arg=0xa3169e8, error=0, reply_info=0xbf86dbac) at services/outside_network.c:377
#14 0x080b2cb3 in comm_point_udp_callback (fd=41, event=<value optimized out>, arg=0xa6800e8) at util/netevent.c:613
#15 0xb7ef20b0 in event_base_loop (base=0xa0d8ea8, flags=0) at event.c:395
#16 0xb7ef2295 in event_base_dispatch (event_base=0xa0d8ea8) at event.c:415
#17 0x080b2e96 in comm_base_dispatch (b=0xa0d8e80) at util/netevent.c:250
#18 0x08080090 in worker_work (worker=0xa0d9450) at daemon/worker.c:1179
#19 0x080798a8 in daemon_fork (daemon=0xa0a5008) at daemon/daemon.c:475
#20 0x0807edd1 in main (argc=1404316, argv=0xb7eb5048) at daemon/unbound.c:636
Comment 6 Wouter Wijngaards 2010-08-18 18:22:31 CEST
Aha it is some corner case involving rs.ripe.net  which plays tricks to measure the EDNS message size that works.

Hum.  A couple more gcores for other points in that loop?  Is caps-for-id still on?

Best regards,   Wouter
Comment 7 Wouter Wijngaards 2010-08-18 18:27:22 CEST
Hi, It looks like I can reproduce it, thank you.
Comment 8 Günter Grodotzki 2010-08-18 18:32:17 CEST
ah ok cool perfect.

at that point id-caps was still on, but I deactivated it now temporarily.
Comment 9 Wouter Wijngaards 2010-08-18 18:55:04 CEST
Gunter, there is a patch in svn trunk of unbound (r2218) that probably removes the trouble (it does for me).  It may change when I get a better handle on this (i.e. when I read that 20Meg logfile I now have).

Best regards,   Wouter
Comment 10 Wouter Wijngaards 2010-08-19 15:01:58 CEST
Hi,

A comprehensive patch is in svn r2219.  It does not use up a lot of CPU any more.  (It does take forever because rs.ripe.net does not work with 0x20).

Best regards,   Wouter