Bug 4218 - segfault in dnssec-triggerd when laptop was multihomed with differing dns settings
segfault in dnssec-triggerd when laptop was multihomed with differing dns set...
Product: dnssec-trigger
Classification: Unclassified
Component: server
x86_64 Linux
: P5 enhancement
Assigned To: unbound team
Depends on:
  Show dependency treegraph
Reported: 2019-01-13 18:47 CET by Diane Trout
Modified: 2019-02-03 23:09 CET (History)
1 user (show)

See Also:

gdb bt full backtrace (4.31 KB, text/plain)
2019-01-13 18:47 CET, Diane Trout
journalctl log of one of the startup crashes while dual homed. (4.09 KB, text/plain)
2019-01-13 18:51 CET, Diane Trout
Make sure new string_list node->next is initialized to NULL (333 bytes, patch)
2019-01-16 06:34 CET, Diane Trout
Details | Diff
valgrind of /usr/sbin/dnssec-triggerd -v -d having dnssec-trigger-script --update-all called (12.67 KB, text/plain)
2019-01-17 06:56 CET, Diane Trout

Note You need to log in before you can comment on or make changes to this bug.
Description Diane Trout 2019-01-13 18:47:31 CET
Created attachment 543 [details]
gdb bt full backtrace

4217 is happened in the same condition.

dnssec-triggerd segfaulted when starting up under systemd. (Strangely it didn't crash when started from a logged in user session).

Due to an unexpected change I was getting DNS information from both my customized openwrt router (over ethernet) and my AT&T DSL modem (via wifi). My internal dns had been customized with my own local networks settings.
Comment 1 Diane Trout 2019-01-13 18:51:14 CET
Created attachment 544 [details]
journalctl log of one of the startup crashes while dual homed.
Comment 2 Wouter Wijngaards 2019-01-14 09:54:07 CET
Hi Diane,

Here is a fix for it.  But I would like to know more about how that NULL string get into the string list, because it should not be there, and is likely to cause other problems (eg crashes).  Perhaps (without the fix) a run in valgrind will produce a statement along the lines of 'use of date released previously here...', instead of the gdb backtrace.  Anyway, based on that gdb backtrace, the fix below should stop the crash from happening.

Best regards, Wouter

Index: riggerd/string_list.c
--- riggerd/string_list.c	(revision 796)
+++ riggerd/string_list.c	(working copy)
@@ -86,7 +86,7 @@
 		 * We already know size of both buffers, so we take advantage of that
 		 * and also of short-cut evaluation.
-		if (len == iter->length && strncmp(iter->string, value, len) == 0) {
+		if (iter->string && len == iter->length && strncmp(iter->string, value, len) == 0) {
 			return 1;
Comment 3 Diane Trout 2019-01-15 07:10:43 CET
Ok so my assumptions about why the bug was triggered might be wrong.

I'm attempting to update the Debian package for dnssec-trigger, and it's currently crashing during the upgrade process while it's trying to restart dnssec-triggerd.service using systemd. However my laptop is now only single homed (wifi is off) so it is likely to not be a problem with just the dns configuration.

The value of iter->string isn't null, but is instead some inaccessible memory address.

(gdb) l
84		for (iter = list->first; NULL != iter; iter = iter->next) {
85			/*
86			 * We already know size of both buffers, so we take advantage of that
87			 * and also of short-cut evaluation.
88			 */
89			if (len == iter->length && strncmp(iter->string, value, len) == 0) {
90				return 1;
91			}
92		}
93		return 0;
(gdb) p iter->string
Cannot access memory at address 0x105010300050608

We (Debian) did try to unhardcode configuration paths and move all of the configuration files into /etc/dnssec-trigger/ instead of just leaving them in /etc, and I've kept find configuration files in the wrong places for quite some time.  (If you're interested in the patch, its here. https://salsa.debian.org/dns-team/dnssec-trigger/blob/master/debian/patches/0006-Unhardcode-most-of-the-paths.patch ) 

But I don't know if its the path changes, or the systemd environment or if its something else that's causing it to crash.
Comment 4 Diane Trout 2019-01-15 08:11:39 CET
Aw I wish bugzilla remembered my text when there was an error with the submit.

I think the problem likes in the json parsing code for update_all. dnssec-triggerd -d doesn't crash until I run:

$ /usr/lib/dnssec-trigger/dnssec-trigger-script --update_all
Running update all with these connections:
    "connections": [
            "default": true,
            "servers": [
            "type": "other",
            "zones": [

and 	struct nm_connection_list original =  yield_connections_from_json(json);

contains the list with the bad value.
Comment 5 Diane Trout 2019-01-16 06:34:00 CET
Created attachment 549 [details]
Make sure new string_list node->next is initialized to NULL

It didn't look like calloc_or_die initialized the memory for new the new node to 0, so I'm pretty sure this is needed.

It certainly has stopped crashing when running 
 sudo /usr/lib/dnssec-trigger/dnssec-trigger-script --update-all
when FWD_ZONES_SUPPORT is enabled
Comment 6 Wouter Wijngaards 2019-01-16 07:30:16 CET
Hi Diane,

I'll include the patch because it stops crashes for you.
But the calloc libc call initializes all memory to zeroes.  So it should already be zero.  It does not hurt to set to NULL again.  But it should also not be necessary.  I wonder why that stopped the crashes for you.

Thanks for the tests.

Best regards, Wouter
Comment 7 Diane Trout 2019-01-17 06:53:55 CET
You're right calloc does initialize the memory to zeros. I changed my assignment to assert(NULL == (*node)->next); and it still worked.

I was tricked by it not crashing 100% of the time.

If I call "usr/lib/dnssec-trigger/dnssec-trigger-script --update-all" several times it eventually crashes..

I'll attach the valgrind output.
Comment 8 Diane Trout 2019-01-17 06:56:00 CET
Created attachment 554 [details]
valgrind of /usr/sbin/dnssec-triggerd -v -d having dnssec-trigger-script --update-all called
Comment 9 Wouter Wijngaards 2019-01-17 10:10:19 CET
Hi Diane,

This looks like the problem, according to the (excellent) valgrind output you posted.  The new_zone was added in nm_connection_list_push_back to the list, but then freed with that statement, the push_back did not copy but add the actual pointer to that list.  Then later, when it tried to use the string inside that new_zone item, it failed on the string.

Does this solve the problem? I think it will, with that valgrind output.

Best regards, Wouter

Index: riggerd/svr.c
--- riggerd/svr.c	(revision 796)
+++ riggerd/svr.c	(working copy)
@@ -1014,7 +1014,6 @@
 				nm_connection_list_push_back(&forward_zones, new_zone);
 				store_add(&stored_zones, zone->string, zone->length);
-				free(new_zone);
 			} else {
 				if (nm_connection_list_contains_zone(&forward_zones, zone->string, zone->length)) {
 					verbose(VERB_DEBUG, "Iter over reverse zones: %s remove from unbound local zones", zone->string);
Comment 10 Diane Trout 2019-02-03 23:09:39 CET
Seems to work for me.

Let me push it to Debian unstable and see if it fixes the problems others are having.