Bug 637 - nsd.db grows limitlessly
nsd.db grows limitlessly
Status: RESOLVED FIXED
Product: NSD
Classification: Unclassified
Component: NSD Code
4.1.x
x86_64 Linux
: P5 major
Assigned To: NSD team
: 636 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-12-15 18:38 CET by Anand Buddhdev
Modified: 2015-01-09 16:23 CET (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Anand Buddhdev 2014-12-15 18:38:31 CET
Just when I thought NSD was behaving itself, it has fallen flat on its face on our servers this afternoon.

Some time ago, I switched from the "nodb" mode back to the "db" mode, now that the memory problems have been solved on Linux.

Our nsd.db files used to be around 25 GB in size. However, on both of our servers, when we looked today, these files are 261 GB in size! This filled up the partition where the db file is located. And then when NSD was unable to write to the nsd.db file, it corrupted it, and crashed. This has caused havoc.

I think I am going to go back to the "nodb" mode, because that is very stable, whereas the db mode still seems to have bugs. I have preserved the 261 GB nsd.db file, along with the xfrd.state file, and the XFRs that had still not been applied, if you want it... :)

The logs don't reveal anything interesting, but I'm happy to show them to you if it helps.
Comment 1 Willem Toorop 2014-12-15 19:03:39 CET
(In reply to Anand Buddhdev from comment #0)
> Just when I thought NSD was behaving itself, it has fallen flat on its face
> on our servers this afternoon.
> 
> Some time ago, I switched from the "nodb" mode back to the "db" mode, now
> that the memory problems have been solved on Linux.
> 
> Our nsd.db files used to be around 25 GB in size. However, on both of our
> servers, when we looked today, these files are 261 GB in size! This filled
> up the partition where the db file is located. And then when NSD was unable
> to write to the nsd.db file, it corrupted it, and crashed. This has caused
> havoc.

Ouch.. Sorry about that.  Our recommended modus operandi is the nodb mode now, because of the much lower memory usage and the negligible increase in zone load time.

> I think I am going to go back to the "nodb" mode, because that is very
> stable, whereas the db mode still seems to have bugs. I have preserved the
> 261 GB nsd.db file, along with the xfrd.state file, and the XFRs that had
> still not been applied, if you want it... :)

Yes please!  If you send me a public ssh key I'll give you access on a host to transfer the files.  Or do you prefer another method?
 
> The logs don't reveal anything interesting, but I'm happy to show them to
> you if it helps.

If they don't reveal anything I don't think we need them.

Thanks for reporting Anand,

-- Willem
Comment 2 Willem Toorop 2015-01-08 17:14:52 CET
*** Bug 636 has been marked as a duplicate of this bug. ***
Comment 3 Wouter Wijngaards 2015-01-09 16:23:00 CET
Hi Anand,

Thank you very much for the large database file, I was able to find the flaw with it.  It was a leak of one megabyte chunks because of an off-by-one in the code.  Large AXFRs can create one megabyte sized free space, which, when contiguous, in certain conditions, would trigger this issue.  The one megabyte is the max for the internal chunks in the file, and that is where the off-by-one hits it, omitting thusly sized chunks from later reuse of free space checks.  With that space never reused, the nsd.db file grows.  The nsd.db file was otherwise fine.

As you can imagine for off-by-one flaws, the fix is only two characters ... (< to <= in two places).

Best regards,
   Wouter