Home > SysAdmin > Dealing With Stupid Programs That Think They Need X

Dealing With Stupid Programs That Think They Need X

September 9th, 2008

The new compute cluster is beginning to feel like a production system. I’m currently run off my feet installing software for the stream of new users. Mostly this is fine, but occasionally I run into software that makes me want to band my head repeatedly on my desk until the pain goes away; or more accurately makes me want to bang the programmer’s head on the desk.

Just today we received a linux port of a code that has been running on the Windows Condor pool for a while now. Everything seemed fine except for it’s stubborn refusal to run if it couldn’t find a windowing system. Bear in mind that it doesn’t actually produce any graphical output it just dies if it can’t connect to X. After a bit of futzing around we discover that the people that normally run this code do something like:

Xvfb :1 -server 1 1024x1024x8 &
export DISPLAY=:1
./stupid_code_that_wants_X

Xvfb is the X virtual framebuffer. It creates a running X client without actually needing any graphics to be running.

Which works just great locally but if you want to launch that as a script in the job scheduling system (we use PBSpro) then you need to be a bit more careful. What happens if two of these jobs try to launch on the same machine? Obviously one of them will fail because display 1 is already allocated. What I really needed was a script that will try to launch Xvfb and increment DISPLAY on failure until it finds a display that is free. For your edification here it is:

get_xvfb_pid () {
	XVFB_PID=`ps -efww | grep -v grep | grep Xvfb |\
       grep $USERNAME | tail -n 1 | awk '{print $2}'`
	}

create_xvfb () {
	USERNAME=`whoami`
	DISPLAYNO=1
	while [ -z $xvfb_success ]
		do
		get_xvfb_pid
		old_XVFB_PID=$XVFB_PID
		XVFB_PID=""
		Xvfb :${DISPLAYNO} -screen 0 1024x1024x8 >& /dev/null &
		sleep 1
		get_xvfb_pid
		if ! [ -z $old_XVFB_PID ]
			then
			if [ -z $XFVB_PID ] && ! [ $XVFB_PID == $old_XVFB_PID ]
				then
				echo "Started XVFB on display $DISPLAYNO process $XVFB_PID"
				xvfb_success=1
			else
				DISPLAYNO=$(($DISPLAYNO + 1))
				XVFB_PID=""
			fi
		else
			if [ -z $XFVB_PID ]
                                then
                                echo "Started XVFB on display $DISPLAYNO process $XVFB_PID"
                                xvfb_success=1
                        else
                                DISPLAYNO=$(($DISPLAYNO + 1))
                                echo "FAIL!" $XVFB_PID
                                XVFB_PID=""
                        fi
		fi
 		done
	export XVFB_PID
	export DISPLAY=:${DISPLAYNO}
	}

kill_xvfb () {
	kill $XVFB_PID
	}

Which you can call from a script like thus:

[arccacluster8]$. ./xvfb_helper
[arccacluster8]$ create_xvfb
Started XVFB on display 1 process 9563
[arccacluster8 ~]$ echo $DISPLAY
:1
[arccacluster8 ~]$ echo $XVFB_PID
9563
[arccacluster8 ~]$ ps -efw | grep Xvfb
username    9563  9498  0 19:31 pts/8    00:00:00 Xvfb :1 -screen 0 1024x1024x8
[arccacluster8 ~]$ kill_xvfb
[arccacluster8 ~]$ ps -efw | grep Xvfb
[arccacluster8 ~]$

I submit that this is a disgraceful hack, but it might come in handy to someone else.

SysAdmin

  1. September 9th, 2008 at 21:07 | #1

    I wasn’t aware of Xvfb, thanks Huw.

    With regard to get_xvfb_pid, may I respectfully refer you to the -U option to ps(1) and awk pattern matches?
    The function could be written with less overhead as:

    get_xvfb_pid () {
    XVFB_PID=`ps -efwwU $USERNAME | awk ‘/Xvfb/ {print $2}’`
    }

    The presence of tail in your pipeline looks wrong too, I think ps(1) sorts by controlling terminal then process id, neither of which are likely to be useful!

  2. September 9th, 2008 at 23:13 | #2

    ps on linux seems to sort by PID which given the way linux behaves seems to equate to latest process last. Hence the use of tail -n 1 which effectively gives us the PID of the last Xvfb to be spawned.

    It didn’t occur to me to use the -U flag of ps, I shall use it next time I need to grep $USERNAME it’s clearly the better way. The awk trick doesn’t work because the /Xvfb/ pattern matches the awk command and you end up with the PID of awk not Xvfb.

  3. Ceri Davies
    September 10th, 2008 at 09:12 | #3

    Process IDs will wrap, which is my main concern. The awk pattern could be refined to filter out toothpicks, perhaps.

    More importantly, I fed you a duff command line - -e overrides -U in most implementations, so you’d probably need to drop it (I’m pleased to comply with the rules that state when being a smartarse, it’s compulsory to make at least one mistake).

  1. No trackbacks yet.