Bug 143 - nsdc stop doesnt stop all server processes on Solaris 10
nsdc stop doesnt stop all server processes on Solaris 10
Status: RESOLVED FIXED
Product: NSD
Classification: Unclassified
Component: NSD Code
3.0.x
Sun other
: P2 minor
Assigned To: NSD team
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-10 17:44 CEST by John Dickinson
Modified: 2006-09-26 15:33 CEST (History)
0 users

See Also:


Attachments
nsd conf file (1.70 KB, text/plain)
2006-09-12 18:09 CEST, John Dickinson
Details
nsd log file (5.37 KB, text/plain)
2006-09-12 18:09 CEST, John Dickinson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John Dickinson 2006-09-10 17:44:39 CEST
On solaris 10 if nsd is started with several server processes not all of them are stopped when nsdc stop is issued. I saw this with 7 or more server processes on Solaris 10 01/06 running on a T2000.
Comment 1 Wouter Wijngaards 2006-09-11 11:59:29 CEST
I have looked and tested on our sparc Solaris machine (not Solaris 10 though),
and I have fixed some issues:
- nsdc would choke on very old 'which' with bad exit codes. [Probably not your problem.]
- zonesdir: directive and relative pathnames in database: pidfile: difffile: statements.

If you used zonesdir: and a relative pidfile: statement, then this would cause nsdc stop to do nothing (it could not find a pidfile) and say 'nsd is not running'. Is this the case?
You can fix this by using full pathnames in the config file server: section.
I have also coded a fix in the svn repository. I think this will solve the issue.

Does this fix the issue? If not can you try  kill `cat ..../nsd.pid`, does that work? If not, can you please run nsd with -L 2 to get verbose logging information (compile with --enable-checking), and provide logs and configfile?

Comment 2 John Dickinson 2006-09-12 17:57:48 CEST
Thanks for the suggestions

I dont think that the patch will help with this issue since some of the server processes do stop, just not all of them. 

For example start nsd with 7 servers
bash-3.00$ ps -ef | grep nsd
     nsd 16299 16289   0 16:22:22 ?           0:01 ../../sbin/nsd -c /var/home/j
ad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2
     nsd 16296 16289   0 16:22:22 ?           0:01 ../../sbin/nsd -c /var/home/j
ad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2
     nsd 16294 16289   0 16:22:22 ?           0:01 ../../sbin/nsd -c /var/home/j
ad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2
     nsd 16300 16289   0 16:22:22 ?           0:01 ../../sbin/nsd -c /var/home/j
ad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2
     nsd 16297 16289   0 16:22:22 ?           0:01 ../../sbin/nsd -c /var/home/j
ad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2
     nsd 16295 16289   0 16:22:22 ?           0:01 ../../sbin/nsd -c /var/home/j
ad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2
     nsd 16298 16289   0 16:22:22 ?           0:01 ../../sbin/nsd -c /var/home/j
ad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2
     nsd 16293 16289   0 16:22:21 ?           0:08 ../../sbin/nsd -c /var/home/j
ad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2
     jad 16302  7594   0 16:23:27 pts/2       0:00 grep nsd
     nsd 16289     1   0 16:17:35 ?           4:47 ../../sbin/nsd -c /var/home/j
ad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2              

then query it a bit

then try to stop it and you are left with 

bash-3.00$ ps -ef | grep nsd
     nsd 16296     1   0 16:22:22 ?           0:01 ../../sbin/nsd -c /var/home/jad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2
     jad 16323  7594   0 16:27:23 pts/2       0:00 grep nsd
     nsd 16293     1   0 16:22:21 ?           0:08 ../../sbin/nsd -c /var/home/jad/opt/nsd-3.0.1-checking/etc/nsd/nsd.conf -L 2   

BTW - The pid file has now been removed.

There doesn't seem to be anything in the logs but I will attach it next. I have removed most of the dname and query lines from the log because I sent 100000 queries to the server so there were quite a lot of them. If you really want them then just let me know.
Comment 3 John Dickinson 2006-09-12 18:09:01 CEST
Created attachment 29 [details]
nsd conf file
Comment 4 John Dickinson 2006-09-12 18:09:32 CEST
Created attachment 30 [details]
nsd log file
Comment 5 Wouter Wijngaards 2006-09-15 09:47:14 CEST
Solaris 10 needs a patch to solve AF_UNIX issues.
NSD uses AF_UNIX pipes for interprocess communication, in particular to send QUIT to those processes. The patch solves the problem of losing the last bytes of data when a AF_UNIX pipe is closed (on Solaris 10). 

Please apply either patch # 120664 (Sparc) or # 120665 (x86). See 
http://sunsolve.sun.com/pub-cgi/show.pl?target=patchpage
Comment 6 Wouter Wijngaards 2006-09-26 15:33:40 CEST
Checked that bug does not show up on a Solaris 10 system with patch installed.