This is too good to pass or edit so I am incorporating the whole “wiki item on runit and postfix”
Having read the message on DNG by Steve Litt about process supervisors :-
I was inspired to investigate for myself. This is not a recipe on how to use runit as your init system, but some notes on using runit as a stage 2 process supervisor.
Of the supervisors mentioned by Steve, only runit was available in the jessie repository so that is what I had a look at.
The set up on ascii is slightly different from jessie. On jessie when installed it all ready for running as a process supervisor. On ascii there is a separate package to install to set up this supervisory role. It would appear that the ascii version is more ready to be run as a replacement init.
On jessie :-
apt-get install runit
apt-get install runit runit-sysv
runsvdir can be seen with space “…..” for displaying log messages,
although it is not actually supervising anything yet.
The ascii installation seems to be missing some of the documetation at :-
in that the info on the separate programs is missing, although the index.html
page does give the link to the author’s pages :-
Also on ascii the directory /etc/service is a link to /etc/runit/default.
Once runit is installed, then comes the job of setting up some daemons to be supervised. They are set up in sub-directories of /etc/sv and when everything is ready a link is set up in /etc/service to the sub-directory in /etc/sv. At this point runsvdir should notice and start running the daemon.
I wanted something simple to start with and went for the display manager. This is very simple, although there is some scope for fun as this can crash X while you are finishing the set-up, so it is good to know what you need to do to finish things. One difference from sysv init is that the daemon should not run in the background, but should remain connected to stdin/stdout. In this case, it means NOT using the “-d” switch to the display manager.
While on jessie I am using “slim” with lxde, on ascii I am using “lxdm” with lxde with some components of lxqt.
There needs to be a file called “run” to run the daemon. Is also possible to set up logging, using svlogd. This uses a sub-directory called log and its own “run” file. The supplied examples show logs being written under the log sub-directory, that is, within the /etc filesystem. I was not too happy with this and set up a separate directory in /var Presumably you could log in /var/log, but I avoided that in case there was any interaction with logrotate, as svlogd looks after its own logs.
These are the steps I took to set up lxdm under runit on ascii :-
mkdir -p /var/svlogd/lxdm adduser --system log chown -R log /var/svlogd cd /etc/sv mkdir -p lxdm/log cd lxdm cat << EOF > run #!/bin/sh exec 2>&1 exec /usr/sbin/lxdm EOF cd log cat << EOF > run #!/bin/sh exec chpst -ulog svlogd -tt /var/svlogd/lxdm EOF cd .. chmod a+x run log/run cd /etc/service update-rc.d lxdm disable ln -s /etc/sv/lxdm .
This failed to stop X, but was all set up, so I rebooted and it all just worked.
Looking at the output from ps :-
ps uaxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND ... root 2398 0.0 0.0 4212 1076 ? Ss 15:44 0:00 runsvdir -P /etc/service log: .................................................................... root 2405 0.0 0.0 4060 672 ? Ss 15:44 0:00 \_ runsv lxdm log 2406 0.0 0.0 4204 684 ? S 15:44 0:00 \_ svlogd -tt /var/svlogd/lxdm root 2407 0.0 0.0 43976 3604 ? S 15:44 0:00 \_ /usr/sbin/lxdm-binary root 2412 0.7 0.8 389104 71868 tty7 Ssl+ 15:44 0:27 \_ /usr/lib/xorg/Xorg :0 vt07 -nolisten tcp -novtswitch -auth /var/run/lxdm/lxdm-:0.auth root 2425 0.0 0.0 49976 3256 ? S 15:44 0:00 \_ /usr/lib/lxdm/lxdm-session xxxxxx 2506 0.0 0.1 352720 13280 ? Ssl 15:44 0:00 \_ /usr/bin/lxsession -s LXDE -e LXDE xxxxxx 2559 0.0 0.0 11076 332 ? Ss 15:44 0:00 \_ /usr/bin/ssh-agent /usr/bin/startlxde xxxxxx 2568 0.0 0.2 196632 16580 ? S 15:44 0:01 \_ openbox --config-file /home/xxxxxx/.config/openbox/lxde-rc.xml xxxxxx 2573 0.1 0.3 910200 25920 ? Sl 15:44 0:04 \_ lxpanel --profile LXDE xxxxxx 2788 0.0 0.0 36816 4668 ? S 15:45 0:00 | \_ urxvt xxxxxx 2789 0.0 0.0 39236 2636 ? S 15:45 0:00 | | \_ urxvt xxxxxx 2790 0.0 0.0 18396 3468 pts/0 Ss 15:45 0:00 | | \_ bash xxxxxx 3668 0.5 0.5 327340 47736 pts/0 Sl 16:38 0:03 | | \_ emacs runit xxxxxx 3820 0.0 0.0 17656 2068 pts/0 R+ 16:48 0:00 | | \_ ps uaxf xxxxxx 2990 0.0 0.4 374260 39964 ? Sl 15:59 0:01 | \_ claws-mail xxxxxx 3306 3.8 3.1 857764 252200 ? Sl 16:21 1:02 | \_ palemoon xxxxxx 2574 0.0 0.5 602248 47304 ? S 15:44 0:00 \_ spacefm --desktop
The log file, /var/svlogd/lxdm/current, is empty, unlike my slim set-up under jessie, where the log file has output from Xorg as it starts up.
The set-up for slim on jessie is the same, simply replacing the word “lxdm” with “slim” everywhere.
Running the display manager under runit was very straightforward. I also tried something more complex. As a domestic user I use an mail provider, GMX in my case. However, some system messages are generated locally for delivery to root. Although I recall that exim had been installed, I had earlier experience with postfix, so I had installed that instead and configured it for my requirements and it was working well. This was obviously a prime target for trying out under runit!
Looking at /etc/init.d/postfix I found nearly 1000 lines of impenetrable script. Whereas, I thought that the actual command to start postfix was :-
and the actual daemon which is left running is “/usr/lib/postfix/master -w”.
There is an example “run” file in the documentation and I tried it
#!/bin/sh exec 1>&2 daemon_directory=/usr/lib/postfix \ command_directory=/usr/sbin \ config_directory=/etc/postfix \ queue_directory=/var/spool/postfix \ mail_owner=postfix \ setgid_group=postdrop \ /etc/postfix/postfix-script check || exit 1 exec /usr/lib/postfix/master
This did not work well. It appeared to be checking every single file on the system and spotting that most were not owned by postfix! It also failed to run the daemon.
After some fiddling around I decided to try and keep it simple. Since “postfix start” does actually work, why not use it? Well it finishes and returns, which is not how runit works. I therefore set up a simple kludge, /etc/sv/postfix/run :-
#!/bin/sh exec 1>&2 /usr/sbin/postfix start while /usr/sbin/postfix status do sleep 300 done
This does actually work! It also puts out a timestamp every 5 minutes in the log file.
I was not sure how things would work at shutdown, so I also set up a “finish” file. This is described as being executed if “run” stops.
#!/bin/sh echo "calling /etc/sv/postfix/finish" /usr/sbin/postfix stop
My hope is that if things are not shutdown properly, then this would tidy it all up! What I see in /var/log/mail.info is that the “master” daemon reports
postfix/master: terminating on signal 15 postfix/postfix-script: fatal: the Postfix mail system is not running
I think that the second line is saying that it can’t stop as it isn’t running.
Postfix does its own logging via syslog.
This simple set up seems to work ok, although not really in the spirit of runit!
Postfix and shutdown
There is some further documentation on runit at Gentoo :-
including some example run scripts to be found at :-
thus they offer a postfix run file :-
I added in -d and corrected the paths. This does seem to work, although I notice that the sub-processes also have the -d flag.
#!/bin/sh -eu /usr/sbin/postfix check exec /usr/lib/postfix/master -d
Although “master” is in itself a process supervisor which looks after the various postfix programs. I think that this leads into questions that have been rising in my mind about shutdown. It is not very clear from the documentation how shutdown works if you only use runit as a process supervisor. As far as I can see the shutdown mechanism would use stage 3 of runit, if you were using runit as init. (As a supervisor, it is only stage 2).
With the run file running “master” directly (with the -d flag), as above, then on shutdown, /var/log/mail.info reports
postfix/master: fatal: master_sigdeath: kill process group: No such process postfix/postfix-script: fatal: the Postfix mail system is not running
I think that the first line is “master” receiving a signal to stop. From /var/log/syslog I can sees that at shutdown ntpd (which is not under runit) exiting on signal 15 (TERM), so I would guess that that is what “master” has received. The second line above is probably from “finish”, which is being executed after “run” finishes and is therefore correct! The man page for “shutdown” says that “All processes are first notified that the system is going down by the signal SIGTERM.” I have assumed that any daemons run by SysV init would have their “K” scripts in /etc/rc0.d run at shutdown, although I can’t currently find the documentation on that!
As the couple of daemons that I am trying under runit are not part of the SysV init, I assume that they just receive SIGTERM.
I am wondering whether the answer is to set up runit’s stage 3 and somehow trigger it at shutdown, early enough that the daemons get the signal before they get a SIGTERM from elsewhere. The sample stage 3 file for Debian Sarge looks like this :-
#!/bin/sh exec 2>&1 PATH=/command:/sbin:/bin:/usr/sbin:/usr/bin LAST=0 test -x /etc/runit/reboot && LAST=6 echo 'Waiting for services to stop...' sv -w196 force-stop /etc/service/* sv exit /etc/service/* echo 'Shutdown...' /etc/init.d/rc $LAST
Since runit is not in charge, presumably running /etc/init.d/rc $LAST can be commented out. I also removed /command from the path.
Maybe, this stage 3 file could be triggered as K010runit in /etc/rc.d to get these daemons to stop cleanly.
cd /etc/rc0.d ln -s /etc/runit/3 K010runit3 cd /etc/rc6.d ln -s /etc/runit/3 K010runit3
Also I looked at whether it is possible have runsv catch the SIGTERM and handle it more gracefully. The man page for runsv tells how to customise control signals, so that for SIGTERM :-
/etc/sv/postfix/control/t could contain :-
#!/bin/sh echo We have SIGTERM for postfix /usr/sbin/postfix stop
so this should run for “terminate” and should also pick up the control signals for “down” and “exit”, according to the man page for runsv.
This does run and appear in /var/svlog/postfix/current when I give the command
sv stop postfix
I reverted the postfix “run” file to the earlier one which uses the “postfix start” command and
with stage 3 being called from rc0.d and with control/t set up, then on shutdown I can see in /var/log/syslog :-
Dec 4 10:49:58 fluorine shutdown: shutting down for system halt Dec 4 10:49:58 fluorine init: Switching to runlevel: 0 Dec 4 10:49:59 fluorine avahi-daemon: Got SIGTERM, quitting. Dec 4 10:49:59 fluorine avahi-daemon: Leaving mDNS multicast group on interface xenbr0.IPv6 with address fe... Dec 4 10:49:59 fluorine avahi-daemon: Leaving mDNS multicast group on interface xenbr0.IPv4 with address 19... Dec 4 10:49:59 fluorine avahi-daemon: avahi-daemon 0.6.31 exiting. Dec 4 10:50:00 fluorine postfix/master: terminating on signal 15 Dec 4 10:50:00 fluorine ntpd: ntpd exiting on signal 15 Dec 4 10:50:00 fluorine postfix/postfix-script: fatal: the Postfix mail system is not running Dec 4 10:50:00 fluorine postfix/postfix-script: fatal: the Postfix mail system is not running Dec 4 10:50:01 fluorine acpid: exiting Dec 4 10:50:03 fluorine rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="2166" x-info="http://www.rsyslog.com"] exiting on signal 15.
I think that the 2 fatal messages are from “finish” and control/t being called after “master” has alread received SIGTERM, although the echo messages are not appearing in /var/svlogd/postfix/current, maybe svlogd has already been shutdown!
I can see that before I started trying runit and with postfix being started from SysV init, that on shutdown /var/log/mail.info reports :-
Nov 11 12:23:20 fluorine postfix/master: terminating on signal 15
So the set up with “runit” seems to produce more or less the same result as with SysV init. Setting up to run stage 3 from rc0.d and also setting up control/t does not appear to have any real effect on shutdown.
The actual file set up in /etc/sv/postfix looks like this :-
# find /etc/sv/postfix/ -name '*' -ls 651625 4 drwxr-xr-x 5 root root 4096 Dec 3 17:41 /etc/sv/postfix/ 652292 4 -rwxr-xr-x 1 root root 71 Nov 24 14:31 /etc/sv/postfix/finish 652288 4 -rwxr-xr-x 1 root root 121 Dec 3 17:41 /etc/sv/postfix/run 656409 4 drwxr-xr-x 2 root root 4096 Dec 3 17:04 /etc/sv/postfix/control 661485 4 -rwxr-xr-x 1 root root 66 Dec 3 17:04 /etc/sv/postfix/control/t 652295 4 drwxr-xr-x 3 root root 4096 Dec 1 16:41 /etc/sv/postfix/log 661408 4 -rwxr-xr-x 1 root root 58 Dec 1 16:38 /etc/sv/postfix/log/run 661426 4 drwx------ 2 root root 4096 Dec 4 13:48 /etc/sv/postfix/log/supervise 661458 0 -rw------- 1 root root 0 Dec 1 16:41 /etc/sv/postfix/log/supervise/lock 661461 4 -rw-r--r-- 1 root root 20 Dec 4 13:48 /etc/sv/postfix/log/supervise/status 661477 0 prw------- 1 root root 0 Dec 1 16:41 /etc/sv/postfix/log/supervise/ok 661728 4 -rw-r--r-- 1 root root 4 Dec 4 13:48 /etc/sv/postfix/log/supervise/stat 652424 4 -rw-r--r-- 1 root root 5 Dec 4 13:48 /etc/sv/postfix/log/supervise/pid 652426 0 prw------- 1 root root 0 Dec 1 16:41 /etc/sv/postfix/log/supervise/control 652309 4 drwx------ 2 root root 4096 Dec 4 13:48 /etc/sv/postfix/supervise 654580 0 -rw------- 1 root root 0 Nov 23 11:20 /etc/sv/postfix/supervise/lock 661489 4 -rw-r--r-- 1 root root 20 Dec 4 13:48 /etc/sv/postfix/supervise/status 661277 0 prw------- 1 root root 0 Nov 23 11:20 /etc/sv/postfix/supervise/ok 661390 4 -rw-r--r-- 1 root root 4 Dec 4 13:48 /etc/sv/postfix/supervise/stat 661388 4 -rw-r--r-- 1 root root 5 Dec 4 13:48 /etc/sv/postfix/supervise/pid 656413 0 prw------- 1 root root 0 Dec 3 17:17 /etc/sv/postfix/supervise/control
The supervise sub-directories are automatically set up by runsv.
I spotted another method of hanging for a daemon (rpc.nfsd) that detaches (in the Gentoo info).
The README file for the rpc.nfsd “run” says :-
“rpc.nfsd is a “fake” service implemented using lock-wedging. Doing it this way allows sv stop rpc.nfsd to work, by stopping it in the finish script.”
After it has started the daemon, which detaches, the run file then does this :-
exec chpst -L supervise/runlock chpst -l supervise/runlock true
The change process state program (chpst) uses the -L switch to open the file supervise/runlock for writing, and obtain an exclusive lock on it. It then runs another copy of chpst which similarly gets a lock on the same file and then runs and returns “true”.
The difference is that -L will fail immediately if it cannot get the lock, while -l will wait until it can get the lock. This line will therefore hang until it gets “sv stop postfix”, when it will terminate true so that “finish” will be executed, which is set up to actually stop postfix. This does work as described, once the control/t file was moved out of the way, which was by-passing killing off the “run” process. The normal shutdown proceedure continues as before.
The set up is now fairly simple as it is only necessary to set up “run” & “finish” along with the logging.
cd /etc/sv/postfix cat << EOF > run #!/bin/sh exec 2>&1 echo "executing /etc/sv/postfix/run" /usr/sbin/postfix start exec chpst -L supervise/runlock chpst -l supervise/runlock true EOF cat << EOF > finish #!/bin/sh exec 2>&1 echo "calling /etc/sv/postfix/finish" /usr/sbin/postfix stop EOF mkdir log cat << EOF > log/run exec chpst -ulog svlogd -tt /var/svlogd/postfix EOF chmod a+x run finish log/run
the part of the output of ps uaxf relating to the postfix daemon looks like this :-
root 2767 0.0 0.0 4100 696 ? Ss 09:02 0:00 \_ runsv postfix log 2768 0.0 0.0 4244 708 ? S 09:02 0:00 \_ svlogd -tt /var/svlogd/postfix root 2771 0.0 0.0 4104 652 ? S 09:02 0:00 \_ chpst -l supervise/runlock true root 2903 0.0 0.0 36168 3152 ? Ss 09:02 0:00 /usr/lib/postfix/master -w postfix 2904 0.0 0.0 38232 3884 ? S 09:02 0:00 \_ pickup -l -t unix -u -c postfix 2905 0.0 0.0 38280 3936 ? S 09:02 0:00 \_ qmgr -l -t unix -u