Bug 839 - Memory grows unexpectedly with large RPZ files
Memory grows unexpectedly with large RPZ files
Status: ASSIGNED
Product: unbound
Classification: Unclassified
Component: server
1.5.9
x86_64 Linux
: P5 normal
Assigned To: unbound team
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-09-29 08:37 CEST by John Todd
Modified: 2016-09-29 18:19 CEST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Todd 2016-09-29 08:37:30 CEST
We are running 1.5.9 on CentOS 7.  We have noticed this problem both with custom-compiled (with DNSTAP.)  The conf files are extremely standard, with only a few lines changed away from stock (I'm happy to share those if it's found valuable.)

We have an RPZ file that is around 850k lines long (it's an "included" file into conf) and about 38 megabytes in length. Loading this file causes unbound to immediately spiral into failure where it uses >3.8G of memory and swaps the machine to a halt.  Since our resolver testbed only has 4G this is a problem that is catastrophic, but just using common sense even giving the machine more memory seems to be not the correct solution to this issue since a 38M file should not produce 3.5G of memory usage.

I pruned the file to 600k zones - 2.87G of memory - this doesn't cause us to crash, but of course it's not using the full RPZ.

This memory use seems excessive.  I would expect this to be much smaller.  

This takes only a second or two to reach this unusable state.  During these startup or (or restarts) where the RPZ file is in place, we also lose the ability to perform any recursive lookups - all requests time out with no replies.

This effectively prohibits the use of unbound with RPZ files, since we cannot update the RPZ or even load it without users going offline for DNS replies.

Am I finding a bug, or is this a design issue that has yet to be triggered by anyone with a suitably large RPZ file?

Note: I marked this as "normal" but it is a blocker for us, since we cannot use unbound in our environment without RPZ.  However, it is probably not a blocker for anyone else so I will mark it as "normal".
Comment 1 Wouter Wijngaards 2016-09-29 09:19:32 CEST
Hi John,

How does that RPZ feed into unbound?  So, does the RPZ output config with local-zone contents?  Try 1.5.10, it may have lower local-zone memory usage (but I do not know if that applies to your situation).  1.5.10 in places deletes the config read lines after applying them into the memory structures, and thus saves memory.

The issue is new to me.  Can you provide files so that I can reproduce the issue?

If you are running Unbound with some sort of RPZ-patch, then I guess that patch is where the bug may be?  (i.e. that dnstap patch is from farsight, and I think an RPZ patch may also be).

Best regards, Wouter
Comment 2 John Todd 2016-09-29 09:38:25 CEST
The RPZ feeds in as inclusions into the conf file tree. Files are in /etc/unbound/local.d/bigrpzfile.conf and then there is a line like:

        include: /etc/unbound/local.d/*.conf

in the main unbound.conf file.

Example lines from the bigrpzfile.conf:

	local-zone: "aakseihgma.ac." static
	local-zone: "aaywsslngalwif.ac." static
	local-zone: "abwcoddatypdfyllct.ac." static
	local-zone: "acslyjcp.ac." static

This isn't a patch - it's just using local-zones to provide an NXDOMAIN answer. Are we doing this in a way that is ill-suited to the application?
Comment 3 Wouter Wijngaards 2016-09-29 09:41:48 CEST
Hi John,

No this is fine.  But you have a lot of data.  Can you send me a sample (i.e. a long list of those static zones, enough to show the memory increase, no need to send it all)?

Best regards, Wouter
Comment 4 John Todd 2016-09-29 09:54:58 CEST
Sadly, I cannot - I am not currently permitted to share it as we do not generate it ourselves.

However, it should be easy to replicate - just create a file of strings (possibly just an increasing numeric counter as the domain component!) in the same format as I provided, but make the file 850,000 lines long.  After all, most RPZ entries are garbage domain name strings anyway that don't resolve for more than a few days.
Comment 5 Wouter Wijngaards 2016-09-29 09:57:17 CEST
Hi John,

I have identified what is using the memory: every local-zone has its own region for memory allocation.  These start with a 4K allocation.  That would use 4k per line in your rpz file.

Not sure what the prettiest fix is yet.  Perhaps a more global region for all the localzones?

Best regards, Wouter
Comment 6 Wouter Wijngaards 2016-09-29 10:00:45 CEST
Hi John,

Does this patch resolve the problem for you?  It delays the 4K allocation until your register actual RRs in the region, which for RPZ, you won't.

Index: services/localzone.c
===================================================================
--- services/localzone.c	(revision 3871)
+++ services/localzone.c	(working copy)
@@ -154,7 +154,7 @@
 	z->namelen = len;
 	z->namelabs = labs;
 	lock_rw_init(&z->lock);
-	z->region = regional_create();
+	z->region = regional_create_custom(sizeof(struct regional));
 	if(!z->region) {
 		free(z);
 		return NULL;

Best regards, Wouter
Comment 7 John Todd 2016-09-29 10:03:37 CEST
I don't have much to offer as a suggestion - I'm unfamiliar with the unbound code and I don't know what would be acceptable.

One thing that comes to mind is that perhaps a special suffixing character could be used to indicate that a zone should not get a memory allocation as they will never be larger than the string provided in the config file.  Perhaps "!"? 

Example:
 	local-zone: "aakseihgma.ac.!" static
	local-zone: "aaywsslngalwif.ac.!" static
	local-zone: "abwcoddatypdfyllct.ac.!" static
	local-zone: "acslyjcp.ac.!" static

Of course, there may be more clean ways of doing this that are not embedded in the zone name itself.

Best of all would be for a more robust RPZ implementation to find its way into unbound; there were discussions of that back in January...
Comment 8 John Todd 2016-09-29 10:04:08 CEST
Let me try the patch - perhaps that solves the issue.  Thanks for the quick testable code!
Comment 9 Wouter Wijngaards 2016-09-29 10:10:46 CEST
Hi John,

I think this fixes it, a file for 850.000 local-zone "$i.com" static now takes approx. 10 times less memory.  (7 Gb -> 300 Mb).  That should solve your issues I think.  I'll close the report (but feel free to send additional improvements).  The code is also available in the code repository.

Thank you for the report, we hadn't seen this issue before, and it likely has been really using a lot of memory for people with many block entries.

Best regards, Wouter
Comment 10 John Todd 2016-09-29 18:19:51 CEST
Yes, that is our result as well - this fixes the issue. Thanks!  Still hopeful for a fully-implemented RPZ method that has statistics and custom actions.