Thursday, May 22, 2008

I learned something today, on the boot/init system of Solaris 8. Yes, it's an old mechanism that isn't used anymore in Solaris 10, but still it amazed me.

A new init script was added in /etc/rc3.d, it simply configured the network interfaces, it is called netcfg.sh and the entry in rc3 was called something like S16netcfg.sh, it only contains a 'start' part as it doesn't matter what happens to the network interfaces during shutdown.

Now, the strange thing was that all init scripts after S16netcfg.sh were no longer executed, thus a bunch of stuff still needed to be started when the system had finished its boot sequence.

Turns out the culprit was the postfix '.sh', when it has this extension Solaris does something else with it. Witness this part of the /sbin/rc3 script;

for f in /etc/rc3.d/S*; do
if [ -s $f ]; then
case $f in
*.sh) . $f ;;
*) /sbin/sh $f start ;;
esac
fi
done


Notice that netcfg.sh would get 'sourced' instead of executed by a shell. To demonstrate the effect, I have made a simple test that shows this nicely.

dirk@my-mac-mini:~/Temp$ ls
test1.sh test2.sh
dirk@my-mac-mini:~/Temp$ cat test*
#!/bin/sh

. ./test2.sh
echo "test1 ends here"
exit 0
#!/bin/sh

echo "test2 ends here"
exit 0
dirk@my-mac-mini:~/Temp$ ./test1.sh
test2 ends here


Because of the 'exit' command in test2.sh everything stops, and test1.sh is not continued. Hence the init sequence stopped when it reached the end of the netcfg init script. It took some time to figure that one out, and i found no mention of the special case of '.sh' init scripts in the Solaris manuals on the Sun site.

No comments: