Bug 519 - ub_ctx_delete may hang in some scenarios (libunbound)
ub_ctx_delete may hang in some scenarios (libunbound)
Status: RESOLVED FIXED
Product: unbound
Classification: Unclassified
Component: server
1.4.20
Other Linux
: P5 minor
Assigned To: unbound team
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-16 22:27 CEST by Brian Julin
Modified: 2013-08-27 09:04 CEST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Julin 2013-08-16 22:27:34 CEST
ub_ctx_delete unconditionally calls tube_read_msg with the nonblock parameter set to 0.  When used via libunbound, any application that tries to clean up (for valgrind purposes or just due to automatic destructors) may encounter a situation where all threads received a fatal signal and the worker threads have since gone defunct, e.g. if an administrator invoked "killall."  The read will never return.  The process is left hanging in the call with unjoined zombie worker threads.
Comment 1 Wouter Wijngaards 2013-08-19 14:03:45 CEST
Hi Brian,

This is the patch I propose for this issue.  It closes the write end of the pipe it uses when it tries to fork , so that when you call delete, it'll read from a closed pipe (since the other process has been killed with a signal).  And that returns instantly and then your problem is solved?

Can you test if this solves your problem?  The code has also been committed to svn trunk of unbound.

Best regards, Wouter



Index: libunbound/libworker.c
===================================================================
--- libunbound/libworker.c	(revision 2936)
+++ libunbound/libworker.c	(working copy)
@@ -372,6 +372,11 @@
 			case -1:
 				return UB_FORKFAIL;
 			default:
+				/* close non-used parts, so that the worker
+				 * bgprocess gets 'pipe closed' when the
+				 * main process exits */
+				tube_close_read(ctx->qq_pipe);
+				tube_close_write(ctx->rr_pipe);
 				break;
 		}
 #endif /* HAVE_FORK */
Comment 2 Brian Julin 2013-08-23 22:31:10 CEST
This patch does seem to fix the case where workers are processes.  I applied it to a (modified to get debugging symbols) 1.4.20-1 debian source to test it.

In the threaded case it of course has no effect.

Note that in the threaded case, the application has no access to the thread IDs and so cannot necessarily exempt them from a commonly seen policy of killing all but the parent thread when a signal is received.  The parent thread usually then does cleanup in such a scenario, and will still end up calling ub_ctx_delete and blocking on the tube, since the worker end of the tube is dead.  Since the fds are commonly shared in this case the problem can't be solved by closing of fds.

That, and a lot of the time support for libraries like libunbound are done in a loadable module framework where messing with the process signal handlers is unlikely to be within the expected purview of a module.

While writing all the internals in such a way as to prevent any API calls from blocking due to spurious thread deaths is likely excessive, the case of ub_ctx_delete is special in that it will quite often be called in such circumstances.
Comment 3 Wouter Wijngaards 2013-08-26 14:25:58 CEST
Hi Brian,

I have attempted to fix the issue for threads too.  It tests if the thread has been killed before it hangs trying to contact it.  Can you see if that works.

You'll need to update libunbound.c, from http://unbound.net/svn/trunk/libunbound/libunbound.c or using svn, if you want to test this.

Best regards,
   Wouter
Comment 4 Brian Julin 2013-08-26 17:36:08 CEST
Yes, that seems to have done the trick.

Thank you very much.
Comment 5 Wouter Wijngaards 2013-08-27 09:04:30 CEST
Hi Brian,

Closing this issue, contact us again if there is anything else!

Best regards, Wouter